2009-02-21

Do you know this tune?

Here's a song I wrote: [lead sheet] [midi] [mp3].

At least I hope I wrote it. I have a nagging suspicion that I may have accidentally plagiarized the first 2 and last 4 measures from somewhere, but I cannot for the life of me locate the source. Perhaps they are original after all. Have you heard them before? It would have to be before 1990, which is when I first remember diddling this.

I'm not posting any lyrics. The meter is common enough that there should be plenty of Western verse that fits the tune. If you think the mood somehow matches Danish poet Halfdan Rasmussen's wonderful Noget om skærsommernætter (which I do not have legal permission to set to music), I assure you that it must be pure coincidence.

Updated 2009-05-10: Horrible typos in chord progression on the lead sheet fixed.

2018-09-07: Plagiarism source found! Lines 1, 7, and 8 here are lines 5, 7, and 4 of Alf Cranner's tune to Den skamløse gamle damen (lyrics by Klaus Hagerup). I must have heard it in a recording by Eddie Skoller (Hugo og de andre, 1979).

2009-02-19

More socket strangeness

Today's fun fact:

If you're on Windows XP, and you have a TCP socket, and you have set SO_SNDBUF to 0 in order to do your own sender buffering, and you have called WSASend to start an overlapped send operation and Windows has called your callback stating the the send succeeded, and the other end of the socket is on the same machine, and that other end has not yet read the data you wrote, and you then call shutdown() to close the sending end of the socket, then the connection will be reset!

It works fine if you never touch SO_SNDBUF at all. But it doesn't help to set SO_SNDBUF to a nonzero value only just before shutdown()...

Downtime

My home server, which hosts trackmap.net as well as my personal website, all images on this blog, as well as my email inbox and a few other minor things, died on Boxing Day. It's been dead for seven weeks now.

I'm not blaming it for dying, really – it's been working faithfully round the clock since 2003. Five years of continuous service is not bad for a box I bought for £150, everything but a monitor included. It must have been built from the cheapest components around and never meant for anything but occasional word processing and web surfing. Instead I put it to work as an allround server and made it run a news server and SpamAssassin in 64 MB of RAM. Cruel me. It took to wheezing loudly in the final year of its life, though.

But gone is gone, which means that I was without email, which means that I was without spam. And I found that so pleasant that it took me seven weeks to gather myself together to buy a replacement. But now it's done and everything should be up and running again. I managed to restore all my data from the old machine's hard drive, which works fine if noisily. All that's lost is whatever email people have been sending me for the last seven weeks. If you have sent me mail and you read this, please resend!

The new server is a Shuttle X27, which cost me twice as much as the old one, but is four times more powerful in all relevant specs. And it's practically noiseless, at least compared to the other sources of noise in my living room – there's a fan inside for the CPU heatsink, but you have to know it's there to notice it.

My only complaint about the X27 is that is was hard to get an OS installed on it. I did not buy an optical drive for it, thinking there would be plenty of other options for getting software into it. But it stubbornly refused to recognize any USB stick as bootable, and setting the BIOS to boot from LAN had no visible effect at all. Eventually I discovered that it was willing to boot Grub from the old servers IDE disk hooked up through an external USB enclosure. Why it did that but would not boot from an ordinary USB solid-state stick is beyond me. Don't both use the same protocol on the USB level?

Anyway, all is well and I am connected again. Sorry for the interruption.

2008-11-16

Trackmap.net changelog

Back in 1996 when the web was young I started a website to host my collection of railway track diagrams. It eventually became trackmap.net. It has updated in fits and starts – the last time it did so was in early 2006 – but I still plan to get time to do updates in the future, such as the maps from this summer's Frankfurt trip.

Despite the irregular updates, trackmap.net was probably my main claim to Internet fame, until I wrote xcftools (which, incidentally exists solely because I needed it for producing track maps easier). Up until now, I have updated a static recent changes page whenever I did something – but seriously, nobody is going to look at that very often to see if there's any news, what with the multi-year hiatuses.

But perhaps some readers would find it useful to subscribe to an RSS/Atom feed for updates to trackmap.net and have their feed reader check it quietly for them? That sounds like a plan. So here is what I'll do: Every time I update trackmap.net (or its subsites) in the future, I'll post about it on this very blog, with the label "trackmap.net".

The label search page becomes the new changelog page, and the feed for new updates is here (Atom) or here (RSS). (Note: These feeds will show only posts related to track maps. If you select the Blogger-provided feed link in the navigation bar, you will get my full blog feed where I blather about various other topics at irregular intervals).

Old changes are still at http://dk.trackmap.net/news.

This posting serves to populate the trackmap.dk labels such that the links just given will not be empty.

2008-11-08

Java 5.0 type inference is underspecified

Here's a program that will make your brain hurt:

class C<E> { }
class G<X,Y,Z> { }
class P { }
class Q { }

public class Test {
   static void foo(Object o) {
      System.out.print("Object ");
   }
   static <T> void foo(G<? extends T, ? extends T, ? extends T> g) {
      System.out.print("G ");
   }
   static public void main(String[] args) {
      foo(new G<C<? super P>, C<P>, C<Q>>());
      foo(new G<C<? super P>, C<Q>, C<P>>());
      foo(new G<C<P>, C<? super P>, C<Q>>());
      foo(new G<C<Q>, C<? super P>, C<P>>());
      foo(new G<C<P>, C<Q>, C<? super P>>());
      foo(new G<C<Q>, C<P>, C<? super P>>());
      System.out.println();
   }
}

Quick, what does it print? No cheating!

For what it's worth, Sun's reference javac, version 1.5.007 produces a .class file that prints

G G G G G G 

whereas Eclipse SDK 3.2.1 creates one that prints

Object Object G G Object Object 

Normally this would lead one to point at one of the compilers and shout BUG! – but after several hours of close reading the Java Language Specification I have come to the conclusion that both behaviors adhere to the letter of the specification. In other words, it is entirely up to the whims of the compiler which of the foos gets called in each of the six calls.

This is somewhat remarkable. Sun has otherwise gone to great pains specifying exactly what a Java program is supposed to mean, modulo the a few clearly defined areas where the virtual machine – not the compiler! – is explicitly given discretion. But here is a completely straightforward single-threaded program where (so I assert) the language underspecifies which method is supposed to be called.

Such nondeterminism makes it hard to do static analysis of Java source code. What happens?

Broadly speaking, generics happen. The half of generics that gets all of the press is parameterized types, but the really hairy stuff only begins to happen when we consider parameterized methods, too. The reason for this is the the programmer always needs to write down the type parameter explicitly in order to instantiate a generic type – but the designers of Java's generics decided that explicit type arguments should not be necessary for calling a generic method. It is possible to give type parameters explicitly in the code, but in the common case, the compiler must try to infer appropriate type arguments given the types of the ordinary arguments.

And this is fairly hard. In fact, because subtyping gets into play, it is so hard that the language makes no claim that the compiler can always find appropriate type arguments when you want to call a parameterized method. The language specification itself remarks, in the middle of the 16 pages of extremely technical definition of how the type inference works:

[...] type inference does not affect soundness in any way. If the types inferred are nonsensical, the invocation will yield a type error. The type inference algorithm should be viewed as a heuristic, designed to perfdorm well in practice. If it fails to infer the desired result, explicit type paramneters may be used instead.

Still, however, one would expect the inference to be deterministic such that one would not risk changing the behavior of a program just by recompiling with a new version of the compiler. But perhaps "perfdorm" and "paramneters" is a hint that this section has not received the most thorough of proofreadings before it went to press.

In the example program above, the type inference eventually computes that T should be instantiated to the "least containing invocation" of an unordered set of the three types C<? super P>, C<P>, and C<Q>. I now quote:

[...] lci, the least containing invocation is defined

lci(S) = lci(e1, ..., en) where ei in S, 1≤in
lci(e1, ..., en) = lci(lci(e1, e2), e3, ..., en)
lci(G<X1, ..., Xn>, G<Y1, ..., Yn>) = G<lcta(X1, Y1),..., lcta(Xn, Yn)>

where lcta() is the the least containing type argument function defined (assuming U and V are type expressions) as:

lcta(U, V) = U if U = V, ? extends lub(U, V) otherwise
lcta(U, ? extends V) = ? extends lub(U, V)
lcta(U, ? super V) = ? super glb(U, V)
lcta(? extends U, ? extends V) = ? extends lub(U, V)
lcta(? extends U, ? super V) = U if U = V, ? otherwise
lcta(? super U, ? super V) = ? super glb(U, V)

[The Java Language Specification, third edition, p. 465]. Read the definition of lci carefully. The first line says that we arrange our three types in some unspecified order. The second line says to combine two types at a time using a two-argument version of lci. And the two-argument lci in the third line just distribute lcta over all the type arguments. (We know from earlier in the type inference that all arguments to lci are instances of the same parameterized type).

This would be a nice and standard construction if only lcta (and by extension the two-argument lci) were commutative and associative. It is indeed commutative – it has to be, for the cases given in the definition only make sense if we understand implicitly that we are to take their commutative closure. But it is manifestly not associative. To wit:

(? super P) lcta (P lcta Q) = (? super P) lcta (? extends Object) = ?

whereas

((? super P) lcta P) lcta Q = (? super P) lcta Q = (? super P & Q)

In the former case, the parameterized foo is not applicable to the call with T = C<?>. Therefore, compile-time overload resolution decides on the less specific but viable foo(Object) instead. But when T is C<? super P & Q>, the parameterized call is applicable.

How clever of javac always to choose the evaluation order that reaches the better result! In fact, I suspect it of cheating and using a smarter multi-argument lcta computation for each type-argument position, instead of selecting on a common order of all lci arguments. Extending the example program to test this hypothesis is left an exercise for the reader.

(Also left for the reader is to figure out the exact meaning and properties of ? super P & Q, a possibility not hinted at anywhere in the JLS except for the two occurrences of "? super glb(U,B)" in the definition of lcta).

2008-11-05

Does P equal NP? This is not an answer.

A preprint by Sten-Åke Tärnlund, purporting to prove that P≠NP, is making its rounds on the internets. I noticed it through a blog post by Bruce Schneier; he quite sensibly notes that "these sorts of papers make the rounds regularly, and [Schneier's] advice is to not pay attention to any of them."

Still, someone has to try to pick such papers apart, if only to make sure that they are as worthless as statistics suggest. I'll volunteer for this one. (This mini-review was originally intended to be a comment to Schneier's posting, but grew a bit too large for that. Also, I want to use markup that plain-text comments cannot contain).

It is not easy to figure out what the author is getting at, because the paper is pithy to the point of sloppiness. Take, for example:

Definition 8 p(a) for c·|a|q some c q ∈ N any a ∈ L.
By comparison with other similarly terse definitions, this apparently means
Definition 8. Let p(a) abbreviate |a|q for some c and q in N, and any list a.
But what are the scope of the quantifiers on c and q? Are they constant throughout the paper or can they depend on something? What can they depend on? It makes no sense in context to let them depend on a ... Complexity theory has no lenity for those who fail to be rigorous about order and scope of quantifiers.

Formally, the paper begins to go wrong no later than Definition 13, where T(s,a,u) is defined to mean something that involve the "truth" symbol ⊨ (or |= if your font, like mine, has no native Unicode glyph for this symbol), which is not in the language of B – but in the remainder of the paper T(s,a,u) is treated as if it was a formula of B, and bogosity ensues.

Of course, a formal error does not mean that the idea behind the paper is flawed. But it does not bode well – one would think that an author who is smart enough to solve an open problem as famous and long-standing as this would be more careful. Indeed, a true flaw shows up:

As far as I understand it, the main argument in the paper goes somewhat like this: Assume that you give me a Turing machine that purports to solve SAT (i.e., decide whether a propositional formula is a tautology) in polynomial time. Then I'll try to run a sufficiently large "pigeonhole formula" PFm through your machine. I already know that this formula is a tautology, but if your machine tells me that in polynomial time, I can extract a short proof of the formula from what the machine does. This contradicts a theorem by Haken which says that proofs (in a certain restricted form) of the pigeonhole formula are always long. Therefore, your machine either does not solve SAT or does not do it in polynomial time.

What first meets the eye here is that the paper appears to redefine SAT. The SAT we all know and love is about propositional formulas (i.e., expressions built from Boolean variables and the Boolean operators not, and, or and so forth). However, PFm which the paper tries to use the SAT solver on is not propositional. It is defined only informally here, but for it to have any relation to Haken's theorem, it needs to contain non-Boolean variables and equality operators. Those belong to predicate calculus, not propositional logic. However, this is not in itself damning, because satisfiability in the pure first-order theory of equality is easily seen to be in NP. A valid proof that this is outside P would also solve the P=NP question.

Correction (2008-11-09): It turns out that there are indeed purely propositional formalizations of the pigeonhole principle, and the paper must be referring to one of those. I had in mind a formula such as "(a=1 or a=2) and (b=1 or b=2) and (c=1 or c=2) implies a=b or a=c or b=c". What I did not notice (and, in my defence, the paper never writes out an example of its PFm) was that we can just expand, say, "a=b" to "(a=1 and b=1) or (a=2 and b=2)", in which case we can take things like b=2 as propositional variables.

The true fallacy here, however, is the tacit assumption (partly concealed as Definitions 13 and 14) that if we have a Turing machine that recognizes tautologies, then a short proof that this machine answers yes for a given formula corresponds to a short proof that the formula is indeed a tautology. But this is not necessarily true; it is at least conceivable that there is a fast Turing machine that recognizes tautologies but that we cannot prove that this is what it does. In that case, a trace of the Turing machine's action does not correspond to any proof that the formula is a tautology, much less a short one. And even if the machine provably recognizes tautologies, we cannot necessarily extract a short proof in a particular formalism from a short run of the machine.

There is various additional smoke and mirrors, including the central proposition which is stated thus:

Theorem 1 SAT ∉ P is true in a simply consistent extension B' of theory B.
Does this mean simply that there is some consistent extension of B in which SAT is not in P, or that SAT∉P is true in every consistent extension of B? I cannot tell. In the former case, the proposition says nothing about the standard model for B (i.e. Turing machines modeled with the standard integers, which are not necessarily a model for the extension). In the latter case, why bother speaking about extensions in the first place?

In conclusion, Schneier is right, and this can safely be ignored.

2008-10-06

Legal consistency

Some time ago I asserted that American court opinions "are fairly accessible once one learns a few key words and turns of phrase". Suddenly I begin to doubt that.

In EEOC v. Lee's Log Cabin (7th. Cir, 2008-10-06) I read this marvellous piece of logic:

The dissent also asserts that "[i]t is undisputed that at all relevant times, Stewart was not only HIV positive . . . but she also had AIDS. So the allegation in the complaint that Stewart was 'HIV positive' is consistent with the fact that she has AIDS." Dissent, at p. 18. No, it's not, but the opposite is true. That is, an allegation that Stewart has AIDS is consistent with an allegation that she was HIV-positive, but not vice versa. If a person has AIDS she is necessarily HIV-positive; but a person who is HIV-positive does not necessarily have AIDS.
(footnote 3). In the kind of logic I was taught, "consistency" between two propositions means that it is conceivable that both can be true at the same time, or (more syntactically) that assuming both will not let you derive a self-contradiction. This is a symmetric notion. It makes no sense to assert that A is consistent with B yet B is not consistent with A.

I wonder whether there is a particular legal meaning to "consistent" which is different enough from the logical one to make the above quote senseful.

2008-08-23

More precise than a million monkeys with typewriters

Ferdinand Foch is remembered for remarking, when the Treaty of Versailles was signed, that it was "not a peace but a 20-year armistice". This was eerily prophetic – he was right to within a handful of months. Foch must have had an unusually precise sense of the forces that make history go.

Or did he? I wonder how many would-be prophets characterized the treaty as an armistice for 10, 15 or 25 years, and were quietly forgotten by history.

It's easy to pick successful prophecies when you're allowed to wait until they come true before you decide which prophecies to point to.

2008-08-17

Why copyright is doomed

In the beginning there was no copyright.

Books were written (and music composed) by men of independent means, or by talented men sponsored by princes and nobles, or by scholars or clerics who were fortunate enough to have the power to decide that writing a treatise on this or that just happened to be part of their own duties. Somehow world literature got along.

Then somebody invented the printing press. The press, together with other technological advances during the 1400s, made mass production of printed books a feasible alternative to hand copying of manuscripts. And this changed everything.

First and foremost, printing books is a capital-intensive undertaking. In addition to the cost of the printing press and its various paraphernalia, setting type by hand is slow and costly. It makes sense to analyze the typesetting cost as capital rather than labor, because once the type is set, you can print copies of the book very cheaply. You can make your customers pay quite a bit more than your marginal cost for each item you sell, which will allow you to break even in the end. And the customers are not going to care whether you've broken even or not, so you can still charge the high price after you break even. If there's enough demand for the title, printing books can be quite a profitable business.

Now, printers who choose to print books that turn out to be popular become rich; printers who choose badly do not. If we assume that the popularity of one title over another is due to some special talent or effort expended by its author, it begins to look quite unfair that the printer should become rich because of them and the author get nothing. That is what the first copyright laws set out to remedy, after the changes wrought by Gutenberg had had a few centuries to sink in.

Actually, it was a bit more complex than that. Publishers – who around this time were emerging as a separate trade of funders and distributors of books – also benefited from copyright: Regulation of who could print what reduced the risk that another publisher would start printing the same book, leading to a price war (which would lose money for both printers as the the price per copy dropped towards the marginal production cost). But the moral justification for having a copyright system in the first place was that it let authors get a cut from the profit when the quality of their works became the foundation of somebody else's business.

During those same centuries, medieval religious drama gave way to professional theatre, and instead of semi-improvised lines that changed with time according to audience reactions and the actors' own wit, plays came to have written-down scripts authored by named playwrights. Theatre was now a business, and a company could make a profit with a good performance of a good play – and thanks to the printing press, the playwright need not be affiliated with the theatre company.

In music, partly in response to the demands of theatre, ensembles grew in size and complexity – beyond the point where the performance could be structured by oral agreements and remembered tunes. Music notation expanded to meet the needs of orchestrally scored music. Eventually the public concert where professional orchestras performed for paying audiences was invented. Having a good work by a recognized composer on the billboard certainly helped draw in those paying audiences.

Playwrights and composers must now suffer that not only could publishers make a business of printing their work, companies and orchestras could make a business of performing them. In response, copyright expanded to cover "public performances" in addition to simply the making of physical copies.

As I have insisted above, the moral purpose of copyright was still to save creators from the indignity of seeing somebody else build a profitable business upon their work without getting a share of the success themselves. But as a side benefit, creators who did create successes got an income source and the opportunity to quit their day jobs and create yet more successes. This benefits society as a whole. It was seen as an independent motive for copyright as early as 1787, when the drafters of the American constitution allowed for federal copyright legislation "to promote the Progress of Science and useful Arts".

Sound recordings were invented in the late 1800s and quickly integrated into the basic structure of copyright. They fit well there, because it requires specialized and expensive equipment to manufacture records, and large specific costs to prepare a recording for being mass produced – exactly the qualities that made copyright the right legal tool for printed books centuries earlier. As for books, creators of successful records have been able to quit their day job, sometimes spectacularly so.

Movies also got copyright. Again, production of film prints is a specialized commercial operation, and public performances is usually organized by businesses too. This fits the philosophical framework of copyright fairly well – though in comparison to a novel or a symphony, which are usually the work of an individual genius, creating a movie is quite a team effort. Copyright fairly easily let itself be extended too deal with large team efforts.

In each of these successful cases, copyright has felt like a right and just tool, because the activity being regulated – whether printing a book, performing an opera, stamping a record, or showing a movie – was naturally and necessarily a commercial undertaking. Done by a business. For a profit, which it is only reasonable to share with a creator.

Well, well, well. Guess what is not true anymore?

We see the underlying assumption break down most clearly for recorded music. For at least a decade there has been no practical reason why copies of sound recordings need to be made by a business rather than by private individuals. The technology is cheap and ubiquitous, and the shift in public perception of copyright has been quick and pervasive. Very few people today really feel that it is morally wrong to create an unlicensed copy of a commercially released song they like. Those of us who nevertheless refrain from doing so do it for the warm fuzzy righteous feeling of complying with the law even when the law is unreasonable.

Movies are not far behind music in this development. Many kinds of books still hold out, because people like the feeling of real bound paper books better than reading from a computer screen. But if print-on-demand hardware becomes ubiquitous enough that bound paper books are not made by specialized businesses anymore (or if really acceptable ebook readers emerge), I predict that the respect for copyright of books will go the same way as that for music. And, of course, already some kinds of non-fiction books are dying breeds, being replaced by free internet versions.

Music and movie publishers everywhere know all this, and are understandably all up in arms. Their response so far is threefold:

  1. Try to put the genie back into the bottle. Have copying technology outlawed or forcibly crippled. If this prevents amateur artists from creating their own works that may compete with commercial offerings, then so much the better.

    If successful this would lead to a nightmare world, but I don't think our politicians are that stupid. We have Vernor Vinge, Cory Doctorow and the EFF to explain why this shouldn't succeed; overall they do it well enough that I think it won't.

  2. Lobby legislatures to introduce ever more draconian laws to enforce copyright. Hope that you can bludgeon people into following rules out of fear that they won't follow out of fairness.

    At best, this is a short-term stopgap. At worst ... well, it doesn't really seem to have the desired effect out in the real world, does it? If anything it seems to reinforce the perception of copyright as an instrument of oppression, rather than a rule that works for the common good.

  3. Appeal to everybody's better nature such that we will all begin to think it is wrong to make private copies of music even if no business is involved. This is the "artists deserve compensation" argument, and it is very convincing on its face. It is also, I think, subtly flawed, and the main point of this essay is to explain what is wrong with it.

    The hidden assumption of the argument is that the creator suffers some kind of moral wrong each time someone enjoys his work, and that the purpose of royalties is to compensate him for that wrong. But what is that wrong? It cannot really be the lack of opportunity to capitalize on the copyright, because the whole point of the argument is to motivate copyright in the first place. But stripped of that explanation, the assumption is simply stupid – creators want their work to be enjoyed, and are pleased when people do so. True, they also want to quit their day job, which requires that income can somehow be derived from the work, but it does not follow that this income has to be in the form of per-copy royalties. And wanting the conclusion to be true does not make the argument hang together.

I think all three responses are likely to fail in the end. Copyright is doomed. It may take decades, or even centuries, for it to die out completely, especially because it has been entrenched in international treaties which are very difficult to change. But die out it must, for it cannot persist in a world where copying is no longer expensive and complex.

This state of affairs creates a political obligation to think ahead. We still want want the world of tomorrow to be one where talented creators can quit their day job and create more. The challenge is to think up a system that does this and still respect the technological reality.

It is tempting to take as a starting point that creators should be rewarded in proportion to the enjoyment people derive from their work. But the trouble is that enjoyment is not easily quantified. In a way, the copyright system uses "number of copies" as a proxy for "amount of enjoyment", which works rather poorly. When making and keeping a copy is cheap and easy, people are likely to keep a copy of every work that they don't actively dislike, whether that work is one of sublime genius or merely passable.

There's a movement towards using "number of listenings/readings" as a proxy for amount of joy, such that if you want to read a book again you have to pay anew. This doesn't work well either. Many a time when I sit down to read a book, it proves to cause me no enjoyment at all and I conclude that the book is horrible. And when I reread a book it may not be because I enjoyed it the first time, but because I need to look up a dry fact about it.

I doubt a working calculus of joy can be developed. How about explicit public subsidies to outstanding creators, awarded by expert committee? That is already common in countries with languages that are perceived as too small for a national literature to flourish funded entirely by copyright, such as Danish. But I don't think it is attractive as a general solution; too much power is concentrated in the hands of the committee, and even with the best of intentions the committee cannot possibly have enough information to make a really informed choice.

Despair and accept that day jobs must be had? Most likely, music and literature would continue to exist – the equipment necessary to create either is within the reach of determined hobbyists. Movies wouldn't, however; it is mindbogglingly expensive to create a movie at the technical quality modern audiences expect. Can that be covered by the box office alone? Probably not in the long run, what with the increasing quality of home AV equipment. And a plan that means that movies cannot be made is not politically viable. Exit "despair".

I am left, for the time being, with no good proposed solution. But that does not make the problem go away. It is still there, whether we like it or not, and solving it before the death throes of copyright end should be high on the list for politicians concerned with culture.

2008-08-15

Confused? You won't be after this...

A customer needed to read some simple information out of our server software programmatically, and suggested that a "SOAP API" might be a good way to do that. To me fell the task of researching what that might mean.

SOAP is a lightweight protocol intended for exchanging structured information in a decentralized, distributed environment. Why, it says so right in the spec!

I dread to think of a protocol that the W3C would call "heavyweight".