String's Maximum length in Java - calling length() method
Considering the String
class' length
method returns an int
, the maximum length that would be returned by the method would be Integer.MAX_VALUE
, which is 2^31 - 1
(or approximately 2 billion.)
In terms of lengths and indexing of arrays, (such as char[]
, which is probably the way the internal data representation is implemented for String
s), Chapter 10: Arrays of The Java Language Specification, Java SE 7 Edition says the following:
The variables contained in an array have no names; instead they are referenced by array access expressions that use nonnegative integer index values. These variables are called the components of the array. If an array has
n
components, we sayn
is the length of the array; the components of the array are referenced using integer indices from0
ton - 1
, inclusive.
Furthermore, the indexing must be by int
values, as mentioned in Section 10.4:
Arrays must be indexed by
int
values;
Therefore, it appears that the limit is indeed 2^31 - 1
, as that is the maximum value for a nonnegative int
value.
However, there probably are going to be other limitations, such as the maximum allocatable size for an array.
java.io.DataInput.readUTF()
and java.io.DataOutput.writeUTF(String)
say that a String
object is represented by two bytes of length information and the modified UTF-8 representation of every character in the string. This concludes that the length of String is limited by the number of bytes of the modified UTF-8 representation of the string when used with DataInput
and DataOutput
.
In addition, The specification of CONSTANT_Utf8_info
found in the Java virtual machine specification defines the structure as follows.
CONSTANT_Utf8_info {
u1 tag;
u2 length;
u1 bytes[length];
}
You can find that the size of 'length' is two bytes.
That the return type of a certain method (e.g. String.length()
) is int
does not always mean that its allowed maximum value is Integer.MAX_VALUE
. Instead, in most cases, int
is chosen just for performance reasons. The Java language specification says that integers whose size is smaller than that of int
are converted to int
before calculation (if my memory serves me correctly) and it is one reason to choose int
when there is no special reason.
The maximum length at compilation time is at most 65536. Note again that the length is the number of bytes of the modified UTF-8 representation, not the number of characters in a String
object.
String
objects may be able to have much more characters at runtime. However, if you want to use String
objects with DataInput
and DataOutput
interfaces, it is better to avoid using too long String
objects. I found this limitation when I implemented Objective-C equivalents of DataInput.readUTF()
and DataOutput.writeUTF(String)
.
Since arrays must be indexed with integers, the maximum length of an array is Integer.MAX_INT
(231-1, or 2 147 483 647). This is assuming you have enough memory to hold an array of that size, of course.