Weird Integer boxing in Java

Integer Cache is a feature that was introduced in Java Version 5 basically for :

  1. Saving of Memory space
  2. Improvement in performance.
Integer number1 = 127;
Integer number2 = 127;

System.out.println("number1 == number2" + (number1 == number2); 

OUTPUT: True


Integer number1 = 128;
Integer number2 = 128;

System.out.println("number1 == number2" + (number1 == number2);

OUTPUT: False

HOW?

Actually when we assign value to an Integer object, it does auto promotion behind the hood.

Integer object = 100;

is actually calling Integer.valueOf() function

Integer object = Integer.valueOf(100);

Nitty-gritty details of valueOf(int)

    public static Integer valueOf(int i) {
        if (i >= IntegerCache.low && i <= IntegerCache.high)
            return IntegerCache.cache[i + (-IntegerCache.low)];
        return new Integer(i);
    }

Description:

This method will always cache values in the range -128 to 127, inclusive, and may cache other values outside of this range.

When a value within range of -128 to 127 is required it returns a constant memory location every time. However, when we need a value thats greater than 127

return new Integer(i);

returns a new reference every time we initiate an object.


Furthermore, == operators in Java is used to compares two memory references and not values.

Object1 located at say 1000 and contains value 6.
Object2 located at say 1020 and contains value 6.

Object1 == Object2 is False as they have different memory locations though contains same values.


public class Scratch
{
   public static void main(String[] args)
    {
        Integer a = 1000, b = 1000;  //1
        System.out.println(a == b);

        Integer c = 100, d = 100;  //2
        System.out.println(c == d);
   }
}

Output:

false
true

Yep the first output is produced for comparing reference; 'a' and 'b' - these are two different reference. In point 1, actually two references are created which is similar as -

Integer a = new Integer(1000);
Integer b = new Integer(1000);

The second output is produced because the JVM tries to save memory, when the Integer falls in a range (from -128 to 127). At point 2 no new reference of type Integer is created for 'd'. Instead of creating a new object for the Integer type reference variable 'd', it only assigned with previously created object referenced by 'c'. All of these are done by JVM.

These memory saving rules are not only for Integer. for memory saving purpose, two instances of the following wrapper objects (while created through boxing), will always be == where their primitive values are the same -

  • Boolean
  • Byte
  • Character from \u0000 to \u007f (7f is 127 in decimal)
  • Short and Integer from -128 to 127

Integer objects in some range (I think maybe -128 through 127) get cached and re-used. Integers outside that range get a new object each time.


The true line is actually guaranteed by the language specification. From section 5.1.7:

If the value p being boxed is true, false, a byte, a char in the range \u0000 to \u007f, or an int or short number between -128 and 127, then let r1 and r2 be the results of any two boxing conversions of p. It is always the case that r1 == r2.

The discussion goes on, suggesting that although your second line of output is guaranteed, the first isn't (see the last paragraph quoted below):

Ideally, boxing a given primitive value p, would always yield an identical reference. In practice, this may not be feasible using existing implementation techniques. The rules above are a pragmatic compromise. The final clause above requires that certain common values always be boxed into indistinguishable objects. The implementation may cache these, lazily or eagerly.

For other values, this formulation disallows any assumptions about the identity of the boxed values on the programmer's part. This would allow (but not require) sharing of some or all of these references.

This ensures that in most common cases, the behavior will be the desired one, without imposing an undue performance penalty, especially on small devices. Less memory-limited implementations might, for example, cache all characters and shorts, as well as integers and longs in the range of -32K - +32K.