String concatenation with null

The previous post concisely presented how to observe default values of static fields set during preparation phase of class loading. An integer primitive type was used as an example. Very similar behavior was described in Puzzle 49: Larger Than Life from Java Puzzlers Traps, Pitfalls, and Corner Cases by Joshua Bloch and Neal Gafter.
When it comes to reference types, the default value is null, which upon dereferencing can result in a runtime exception. Let’s consider the class introduced in the previous post but this time with a static field of String type.

public class UnexpectedStringValue {
    static final UnexpectedStringValue instance = new UnexpectedStringValue();

    static String DEFAULT_VALUE = "world";

    String member;

    public UnexpectedStringValue() {
        this.member = "Hello " + DEFAULT_VALUE;
    }

    public static void main(String[] args) {
        System.out.println(instance.member);
    }
}

During the execution on the main method from the class UnexpectedStringValue Hello null is printed to the standard output. Java compiler preemptively replaces explicit single-scope string concatenation with a chain of StringBuilder.append() calls. In the presented snippet javac has easy task to do, because type of DEFAULT_VALUE is known upfront. Consequently, the execution is dispatched to StringBuilder.append(String). Taking a peek at generated bytecode confirms this hypothesis. Excerpt from decompiled class file corresponds to string concatenation from UnexpectedStringValue‘s constructor.

INVOKESPECIAL java/lang/StringBuilder.<init> ()V
LDC "Hello "
INVOKEVIRTUAL java/lang/StringBuilder.append (Ljava/lang/String;)Ljava/lang/StringBuilder;
GETSTATIC pl/ciruk/blog/preparation/UnexpectedStringValue.DEFAULT_VALUE : Ljava/lang/String;
INVOKEVIRTUAL java/lang/StringBuilder.append (Ljava/lang/String;)Ljava/lang/StringBuilder;
INVOKEVIRTUAL java/lang/StringBuilder.toString ()Ljava/lang/String;

What would compiler do if the expression from constructor was altered to append null directly? The following code can be compiled successfully and when executed produces the same result as the code before.

public UnexpectedStringValue() {
    this.member = "Hello " + null;
}

What did compiler actually do? This time the task was more complicated. To understand why, take a look at the following expression, which is not valid and compiler would complain with a message informing about ambiguous method reference.

new StringBuilder().append(null);

Upon inspection of bytecode generated for "Hello " + null we can see, that yet another overloaded method was called.

ACONST_NULL
INVOKEVIRTUAL java/lang/StringBuilder.append (Ljava/lang/Object;)Ljava/lang/StringBuilder;

Compiler must have applied a trick to handle null reference as a reference of Object type. This kind of action is dictated by JLS 15.18.1 and 5.1.11, which states that during concatenation of strings, null references gets converted to string "null".

Observing class loading preparation phase

During javac compilation, Java class is translated into a sequence of instructions and corresponding metadata. This intermediate representations allows JVM to dynamically extend the runtime environment with custom classes and execute code created in different languages (e.g. Scala or groovy). A *.class file can be considered as a key-value pair which has a specific name and corresponding binary representation.
To be able to execute any code, JVM needs to find that representation and read it, which is called loading.
Loaded class is not usable at that point, because it may be invalid or it may refer to external types. Putting the loaded class in the runtime context is called linking.
The next step is to initialize static variables for the class and run static initialization code blocks.

The whole process is described in details in chapter 5 of Java Language Specifiaction. The topic is covered eagerly in countless articles and books. It seems, however, that one of the subphases of the linking phase does not get as much coverage as the others.

During the linking phase there are different activities performed by the JVM. Due to security reasons the first subphase in responsible for extensive bytecode verification. Preparation consists of static field creation in initialization of their default values (for example: 0 for int, false for boolean, etc.). Constraints are imposed on the overridden methods return types as well. Finally, all types referenced by the class being linked are resolved.

The subphase responsible for initializing static fields to its default value is often forgotten. While imposing type constraints on methods seems like dark arts from high-level code perspective, the step when the defaults are set is tangible and can be observed.
The following sample is by no means a good coding practice since it might introduce unexpected behavior. The class presented below holds an instance of itself as a static member. It has additional static field and a non-static field as well. Non-static field refers to the static one, which at that point is initialized to its default value, zero. Changing the order of static fields would eliminate the error.

public class UnexpectedValue {
    static final UnexpectedValue instance = new UnexpectedValue();

    static int DEFAULT_VALUE = 123;

    int member;

    public UnexpectedValue() {
        this.member = DEFAULT_VALUE * 10;
    }
}

While we’re tempted to think that the valueOfMember should be set to 1230, it actually holds 0, because of unexpected value from DEFAULT_VALUE.

@Test
public void shouldResolveToDefaultIntValue() throws Exception {
    int valueOfMember = UnexpectedValue.instance.member;

    assertThat(valueOfMember, is(equalTo(defaultIntValue())));
}

int defaultIntValue() {
    return 0;
}