2008-11-08

Java 5.0 type inference is underspecified

Here's a program that will make your brain hurt:

class C<E> { }
class G<X,Y,Z> { }
class P { }
class Q { }

public class Test {
   static void foo(Object o) {
      System.out.print("Object ");
   }
   static <T> void foo(G<? extends T, ? extends T, ? extends T> g) {
      System.out.print("G ");
   }
   static public void main(String[] args) {
      foo(new G<C<? super P>, C<P>, C<Q>>());
      foo(new G<C<? super P>, C<Q>, C<P>>());
      foo(new G<C<P>, C<? super P>, C<Q>>());
      foo(new G<C<Q>, C<? super P>, C<P>>());
      foo(new G<C<P>, C<Q>, C<? super P>>());
      foo(new G<C<Q>, C<P>, C<? super P>>());
      System.out.println();
   }
}

Quick, what does it print? No cheating!

For what it's worth, Sun's reference javac, version 1.5.007 produces a .class file that prints

G G G G G G 

whereas Eclipse SDK 3.2.1 creates one that prints

Object Object G G Object Object 

Normally this would lead one to point at one of the compilers and shout BUG! – but after several hours of close reading the Java Language Specification I have come to the conclusion that both behaviors adhere to the letter of the specification. In other words, it is entirely up to the whims of the compiler which of the foos gets called in each of the six calls.

This is somewhat remarkable. Sun has otherwise gone to great pains specifying exactly what a Java program is supposed to mean, modulo the a few clearly defined areas where the virtual machine – not the compiler! – is explicitly given discretion. But here is a completely straightforward single-threaded program where (so I assert) the language underspecifies which method is supposed to be called.

Such nondeterminism makes it hard to do static analysis of Java source code. What happens?

Broadly speaking, generics happen. The half of generics that gets all of the press is parameterized types, but the really hairy stuff only begins to happen when we consider parameterized methods, too. The reason for this is the the programmer always needs to write down the type parameter explicitly in order to instantiate a generic type – but the designers of Java's generics decided that explicit type arguments should not be necessary for calling a generic method. It is possible to give type parameters explicitly in the code, but in the common case, the compiler must try to infer appropriate type arguments given the types of the ordinary arguments.

And this is fairly hard. In fact, because subtyping gets into play, it is so hard that the language makes no claim that the compiler can always find appropriate type arguments when you want to call a parameterized method. The language specification itself remarks, in the middle of the 16 pages of extremely technical definition of how the type inference works:

[...] type inference does not affect soundness in any way. If the types inferred are nonsensical, the invocation will yield a type error. The type inference algorithm should be viewed as a heuristic, designed to perfdorm well in practice. If it fails to infer the desired result, explicit type paramneters may be used instead.

Still, however, one would expect the inference to be deterministic such that one would not risk changing the behavior of a program just by recompiling with a new version of the compiler. But perhaps "perfdorm" and "paramneters" is a hint that this section has not received the most thorough of proofreadings before it went to press.

In the example program above, the type inference eventually computes that T should be instantiated to the "least containing invocation" of an unordered set of the three types C<? super P>, C<P>, and C<Q>. I now quote:

[...] lci, the least containing invocation is defined

lci(S) = lci(e1, ..., en) where ei in S, 1≤in
lci(e1, ..., en) = lci(lci(e1, e2), e3, ..., en)
lci(G<X1, ..., Xn>, G<Y1, ..., Yn>) = G<lcta(X1, Y1),..., lcta(Xn, Yn)>

where lcta() is the the least containing type argument function defined (assuming U and V are type expressions) as:

lcta(U, V) = U if U = V, ? extends lub(U, V) otherwise
lcta(U, ? extends V) = ? extends lub(U, V)
lcta(U, ? super V) = ? super glb(U, V)
lcta(? extends U, ? extends V) = ? extends lub(U, V)
lcta(? extends U, ? super V) = U if U = V, ? otherwise
lcta(? super U, ? super V) = ? super glb(U, V)

[The Java Language Specification, third edition, p. 465]. Read the definition of lci carefully. The first line says that we arrange our three types in some unspecified order. The second line says to combine two types at a time using a two-argument version of lci. And the two-argument lci in the third line just distribute lcta over all the type arguments. (We know from earlier in the type inference that all arguments to lci are instances of the same parameterized type).

This would be a nice and standard construction if only lcta (and by extension the two-argument lci) were commutative and associative. It is indeed commutative – it has to be, for the cases given in the definition only make sense if we understand implicitly that we are to take their commutative closure. But it is manifestly not associative. To wit:

(? super P) lcta (P lcta Q) = (? super P) lcta (? extends Object) = ?

whereas

((? super P) lcta P) lcta Q = (? super P) lcta Q = (? super P & Q)

In the former case, the parameterized foo is not applicable to the call with T = C<?>. Therefore, compile-time overload resolution decides on the less specific but viable foo(Object) instead. But when T is C<? super P & Q>, the parameterized call is applicable.

How clever of javac always to choose the evaluation order that reaches the better result! In fact, I suspect it of cheating and using a smarter multi-argument lcta computation for each type-argument position, instead of selecting on a common order of all lci arguments. Extending the example program to test this hypothesis is left an exercise for the reader.

(Also left for the reader is to figure out the exact meaning and properties of ? super P & Q, a possibility not hinted at anywhere in the JLS except for the two occurrences of "? super glb(U,B)" in the definition of lcta).

4 comments:

  1. So, isn't this a bug in the specification, actually? Shouldn't it be reported somewhere?

    I'm not sure I understand why they didn't specify complete type inference. I know it can get exponential in bad cases, but they could have said “either complete the inference or stop with an error”, i.e. if it's too hard and it takes too long.

    That would have the same effect of having to specify types manually sometimes, but if a program is compiled, the result would be deterministic.

    I also encountered a case where neither javac nor Eclipse can figure out the correct type. Based on their error messages, they seem to reach different conclusions, but I can't tell if the Java specification would allow inferring the correct types.

    Given:

    <T extends Comparable<? super T>,
      V extends Iterable<T>>
    Comparator<V> iterableComparator() { ... }

    <T> boolean isOrdered(
      Iterable<T> values,
      Comparator<? super T> comparator) { ... }

    Then the following is valid:

    void main() {
      List<List<Integer>> list = null;
      isOrdered(list,
        this.<Integer, List<Integer>>
        iterableComparator());
    }

    However, if the explicit type arguments to iterableComparator are not given, neither compiler accepts the call.

    javac infers T,V = java.lang.Comparable<? super T>,java.lang.Object
    Both Eclipse and gcj infer V = Iterable<Comparable<? super T>>, but don't mention T.

    ReplyDelete
  2. Wow that looks super complicated. I'd never be able to do program like this by myself. Well done and thanks for sharing.

    ReplyDelete
  3. So, this is a dumbed down way to explain this, there is an ordered sequence of rational exponet resolution....even more plain: there is a order preference for the example scripts resolutions, Sun will not share since only one or two people know what we are talking about.

    Great find, please consider that this script is not needed - it is a way to exploit a un named trait of the software. See JavaScript 1.0Beta to learn about other un named traits.

    :)

    ReplyDelete
  4. شركتنا من المتميزون في اعمال الاصلاح بدون هدم او تكسير من خلال شركة ركن البيت التي تقدم الكثير والكثير في عمل اللازم وتصحيح الاخطاء التي تسببها تسريبات المياه فنحن مثلا

    شركة كشف تسربات المياه بجدة تقدم خدمة لعمل الاصلاح بدون اي خراب ونقدم النصيحة للعملاء بالابتعاد عن الاعمال التي تؤدي الي هذا الخراب فتعاملك مع شركة كشف تسربات بجدة لديها الخبرة الكافية تساعدك في الحفاظ علي منزلك كما اننا نتمكن في اننا سوف نرتقي بخدمة لاننا نقوم بالعمل السليم لها كما يوجد لدينا خدمات العوازل التي تمنع التسريبات من الاسقف لكم والحوائط والخزانات من خلال شركة تسمي الاولي في مجالها لذلك نحن نقدم شركة عزل خزانات بالرياض التي تعتبر في عل الخزانات الارضية من الداخل بواسطة مواد متميزة كما نقدم لكم شركة عزل اسطح بالرياض لعمل العوازل التي تمنع جميع التسريبات في الاسقف

    ReplyDelete