Iulian Dragos


Yesterday I played a bit with a different build tool. Frankly, ANT is not at all my favorite:

  • XML syntax is plainly wrong for configuration files. It's verbose and redundant and very unpleasant to read
  • ANT forces you to master two sets of commands: one for command line, and an equivalent XML encoding (think about <mkdir> <copy> etc.)
  • Extending ANT is heavyweight: you need to get out of your 'nice' XML world and start programming Java, then you delve into the hell of CLASSPATH problems for your private tasks
  • It has no means to 'encapsulate' or reuse pieces of code, besides the <import> directive which is just an #include

Of course, make is traditionally thought as quirky and bad, but the only downside I see is that there's no simple equivalent on Windows (partly because it relies heavily on external commands to get things done).

The tool i tried is called SCons and seems to be used by KDE and idSoftware, which is pretty good already. It is based on Python, so you have the power of a scripting language. It is portable and build files are platform independent as long as all 'pure' Python calls are through system libraries -- like all path manipulations, etc. I managed to make a 'builder' for scalac quite quickly, and have a short buildfile that makes NSC.

import os

env=Environment(ENV = os.environ)

# read the nsc source files
for line in file("config/list/nsc.lst"):
    if line[0] != '#' and line[0] != '\n':
        nsc_files = nsc_files + ['sources/scala/tools/nsc/' + line.rstrip('\n')]

# Compile nsc
env.Jar('/tmp/jars/scons_nsc.jar', '/tmp/scons_nscClasses')
env.Clean('nsc', '/tmp/scons_nscClasses')

Alias('nsc', 'compile.nsc')
Alias('nsc', '/tmp/jars/scons_nsc.jar')

The call to scalac is hidden in the call to 'env.Scalac' which invokes my custom builder. It is pretty straight forward to write such builders. What's cool is that dependencies are automatically found by scons for most languages (for example, they have a stripped-down Java parser to collect dependencies between classes). The interface is specified, so new so-called scanners could be written.

The only downside I see is the fact that build files have to be valid Python code, which makes taks definitions to look like method calls.

The other option I checked (but not by writing anything) is Rake, or Ruby make. This is cool because I like Ruby syntax more, but has no default tasks. Basically, it provides a nice engine for dependency checking, while actual compilation steps are implemented as ruby code. I guess there are some standard tasks around. Martin Fowler has a nice introduction on it.

The bottom line is, ANT is not that great, and there are alternatives. The most important argument in favor of ANT is that it's very widespread and tool support. However, emacs is as good at ANT as at Python or Ruby, while people trying scala could still use ANT to build their *own* projects. For the compiler, noone would be very much annoyed to try a new build tool (I guess some people on the mailing list already installed Ruby ;))


Being happy to move a bit further with the nsc, I decided to write some bits in by lonely blog. As Sebastian used to say, we're moving with an average speed of 2 blog entries per month. Sheesh... that's not a lot.

The last few days I implemented a simple verifier for icode. Icode stands for intermediary codes, and is a lot like the JVM bytecodes. In the beginning, its main purpose was to allow for mixins to be expaneded from compiled code (being so close to machine level, it was easy to reverse-transform from bytecode to icode). Now that Martin found a different encoding of mixins that allows separate compilation, icodes will be just a common representation for the two backends: jvm and .net. Maybe some optimizations will make use of it, such as most of the classical ones -- copy/constant propagation, dead code elimination, etc.

nsc compilation scheme is broken into several phases, and between each phase the abstract syntax trees can be checked for consistency. Basically, it's just the typechecker which runs through the whole trees and makes sure everything is still well-typed. Once the icodes come into play, I thought of making a similar checker for this representation. This will be of great help when some transformations will be done on icode. The type checker will do a job very similar to the bytecode verifier: basically, make sure every primitive operation has the right number of values (with the right types) on top of the stack, that every reference to a field/local/method exists, that accesses don't break modifiers (like accessing private fields from other classes, etc.) and that stacks agree at merge points. The last point proved the trickiest.

A bit of background: Icode is a control flow graph structure, dividing code into basic blocks of linear code. If control flow may reach a basic block on different paths, the stacks produced on all possible paths to that basic block have to 'agree'. That is, they have to have the same depth, and types at corresponding positions should have a common 'ancestor'. This ancestor type is called a least upper bound (lub) of that type. Therefore, code coming after the merge point may call methods and access fields only of this lub type.

The problem was exacly this lub type: it's clear already that it is interesting only for reference types (I ignore primitive types, for then it exist only if all types are the same). This lub is proved to exist for lattices of types, but Java types with interfaces don't form a (semi)lattice. For example:

interface I {..}
interface J {..}
class A implements I, J {..}
class B implements I, J {..}

Here, between A and B there is no unique lub. It could be either I or J, but there is no subtyping relation between them. The lub should be a combination of the two, represented in Scala by 'I with J'. Interestingly enough, according to this paper Sun's JVM will just ignore and consider the lub of such types to be java.lang.Object! Of course, this is not sound and all interface calls will have to be checked at runtime, since the verifier only makes sure that you get an Object on the stack. However, this was not an option for me, as I wanted a static checker (for runtime checks, I'll just have the test suite, but that does not mean we catch errors early :-) ). Additionally, all calls in Scala are in fact interface calls, because of the translation scheme. Such a solution would be very limited indeed.

Unfortunately, this insight has come to me only after implementing some lub routines, using the simple Java solution. Everything turned out to be in fact simpler, since I could use the Types.lub routine written for the type checker. One question still remains: such lub types of the form A with B with.. (let's call them compound types) are they always 'concrete', in the sense that a class has been generated for them? If some source file contains a definition of some class that looks like new A with B a class declaration which represents A with B will be generated.. but what if this lub has never been 'touched' before? The checker relies on being able to insert a reference type to that class in the type stack simulator, and using to for further checks. However, reference types refer only to existing Java-flavoured classes. Maybe this will have to be made more general, and use full scala types for this checking.