Here's a program that will make your brain hurt:
class C<E> { }
class G<X,Y,Z> { }
class P { }
class Q { }
public class Test {
static void foo(Object o) {
System.out.print("Object ");
}
static <T> void foo(G<? extends T, ? extends T, ? extends T> g) {
System.out.print("G ");
}
static public void main(String[] args) {
foo(new G<C<? super P>, C<P>, C<Q>>());
foo(new G<C<? super P>, C<Q>, C<P>>());
foo(new G<C<P>, C<? super P>, C<Q>>());
foo(new G<C<Q>, C<? super P>, C<P>>());
foo(new G<C<P>, C<Q>, C<? super P>>());
foo(new G<C<Q>, C<P>, C<? super P>>());
System.out.println();
}
}
Quick, what does it print? No cheating!
For what it's worth, Sun's reference javac
,
version 1.5.007 produces a .class
file that prints
G G G G G G
whereas Eclipse SDK 3.2.1 creates one that prints
Object Object G G Object Object
Normally this would lead one to point at one of the compilers and shout
BUG! – but after several hours of close reading the Java
Language Specification I have come to the conclusion that both behaviors
adhere to the letter of the specification. In other words, it is
entirely up to the whims of the compiler which of the foos
gets called in each of the six calls.
This is somewhat remarkable. Sun has otherwise gone to great pains specifying
exactly what a Java program is supposed to mean, modulo the a few
clearly defined areas where the virtual machine – not the compiler!
– is explicitly given discretion. But here is a completely
straightforward single-threaded program where (so I assert)
the language underspecifies which method is supposed to be called.
Such nondeterminism makes it hard to do static analysis of Java source code.
What happens?
Broadly speaking, generics happen. The half of generics that gets all
of the press is parameterized types, but the really hairy stuff only
begins to happen when we consider parameterized methods, too.
The reason for this is the the programmer always needs to write down
the type parameter explicitly in order to instantiate a generic
type – but the designers of Java's generics decided that
explicit type arguments should not be necessary for calling a generic
method. It is possible to give type parameters explicitly in the code,
but in the common case, the compiler must try to infer appropriate
type arguments given the types of the ordinary arguments.
And this is fairly hard. In fact, because subtyping gets into play, it
is so hard that the language makes no claim that the compiler can
always find appropriate type arguments when you want to call
a parameterized method. The language specification itself remarks,
in the middle of the 16 pages of extremely technical definition of
how the type inference works:
[...] type inference does not affect soundness in any way. If the types
inferred are nonsensical, the invocation will yield a type error. The
type inference algorithm should be viewed as a heuristic, designed to
perfdorm well in practice. If it fails to infer the
desired result, explicit type paramneters may be used instead.
Still, however, one would expect the inference to be deterministic
such that one would not risk changing the behavior of a program just by
recompiling with a new version of the compiler. But perhaps "perfdorm" and
"paramneters" is a hint that this section has not received the most thorough
of proofreadings before it went to press.
In the example program above, the type inference eventually computes
that T
should be instantiated to the
"least containing invocation" of an unordered set
of the three types C<? super P>
,
C<P>
, and C<Q>
. I now quote:
[...] lci, the least containing invocation is defined
lci(S) = lci(e1, ..., en)
where ei in S,
1≤i≤n
lci(e1, ..., en)
= lci(lci(e1, e2), e3, ..., en)
lci(G<X1, ..., Xn>, G<Y1, ..., Yn>)
= G<lcta(X1, Y1),..., lcta(Xn, Yn)>
where lcta() is the the least containing type argument function defined
(assuming U and V are type expressions) as:
lcta(U, V) = U if U = V, ? extends lub(U, V) otherwise
lcta(U, ? extends V) = ? extends lub(U, V)
lcta(U, ? super V) = ? super glb(U, V)
lcta(? extends U, ? extends V) = ? extends lub(U, V)
lcta(? extends U, ? super V) = U if U = V, ? otherwise
lcta(? super U, ? super V) = ? super glb(U, V)
[The Java Language Specification, third edition, p. 465].
Read the definition of lci carefully. The first line says that we
arrange our three types in some unspecified order. The second line says
to combine two types at a time using a two-argument version of lci.
And the two-argument lci in the third line just distribute
lcta over all the type arguments. (We know from earlier in the
type inference that all arguments to lci are instances of the
same parameterized type).
This would be a nice and standard construction if only lcta
(and by extension the two-argument lci) were commutative and
associative. It is indeed commutative – it has to be, for the cases
given in the definition only make sense if we understand implicitly that
we are to take their commutative closure. But it is manifestly not
associative. To wit:
(? super P
) lcta (P
lcta Q
)
= (? super P
) lcta (? extends Object
)
= ?
whereas
((? super P
) lcta P
) lcta Q
= (? super P
) lcta Q
= (? super P & Q
)
In the former case, the parameterized foo
is not applicable
to the call with T = C<?>
. Therefore, compile-time overload
resolution decides on the less specific but viable foo(Object)
instead. But when T
is C<? super P & Q>
, the parameterized call is applicable.
How clever of javac
always to choose the evaluation order
that reaches the better result! In fact, I suspect it of cheating and
using a smarter multi-argument lcta computation for each type-argument
position, instead of selecting on a common order of all lci arguments.
Extending the example program to test this hypothesis is left an exercise
for the reader.
(Also left for the reader is to figure out the exact meaning and properties
of ? super P & Q
, a possibility not hinted at anywhere in
the JLS except for the two occurrences of "? super glb(U,B)" in the
definition of lcta).