Google Search: larceny rocks group:comp.lang.scheme.* group:comp.lang.scheme.*


Groups
Advanced Groups Search Groups Help
Groups search result 1 for larceny rocks group:comp.lang.scheme.* group:comp.lang.scheme.*
Search Result 1
From: David Rush (kumo@bellsouth.net)
Subject: Typechecks Suck! (Was: Surprising performance results (or Larceny Rocks!))
Newsgroups: comp.lang.scheme
View: Complete Thread (4 articles) | Original Format
Date: 2001-07-23 10:04:19 PST
Sorry it's been so long folks, but RL intervened a bit. Here is the
follow-up I promised to some of you concerning the performance of
various Scheme implementations on a real-life programming that I was
using for statistical analysis.

David Rush <kumo@bellsouth.net> writes:
> For my benchmark I used 1000 words randomly culled from
> /usr/dict/words on Solaris 2.6. All the code was run on an unladen
> swallow^W Sun Ultra-60 running Solaris 2.8
> 
> And the winner is: Larceny!
> 
>        larceny: 97.30u 0.29s 1:37.73 99.8%
>         stalin: 113.03u 0.04s 1:53.13 99.9%
>         bigloo: 235.63u 0.21s 3:56.04 99.9%
>        chicken: 410.29u 3.80s 6:54.17 99.9%
> PLT (bytecode): 689.90u 0.67s 11:30.64 99.9%
> 
> My theory as to why:

Was complete and utter shite. Issue #1 was simply naive use of the
compiler options. Between diddling Scheme and gcc options I was able
to speed up the program anywhere from twofold to eightfold. On the
average gcc options made more of a difference than Scheme compiler
options, but I didn't bother to keep the intermediate results to prove
it[1]. I believe that this parallels the experience of the OCaml
community; who (IIRC) have a fairly simple (high-level) optimizer, but
have a bang-up code generator.

I would like to take a minute to defend the validity of my original
results. Obviously they are
9556 bytes received in 1.489 seconds (6420 bytes/sec)
 fairly useless from an academic POV;
however they are indicative of the (for lack of a better term) 'user
experience'. I would guess that 80% of Software development is done in
a 'make it work'/'ship it' mode. Very little time is spent on
optimization[2] for two reasons: bugs and bugs. The first bugs is
historical; I remember days when turning on compiler optimizations
meant that you had a good chance of getting obviously incorrect code
generated. This still happens occasionally, although not nearly as
often. The second 'bugs' derives from the fact that optimization can
expose real bugs and/or make finding `hidden' bugs (currently
undiscovered but which would have been bugs w/out optimization) very
difficult. These effects combine to make me (and many other engineers)
trust that the compiler-writers have given us `optimal' (for certain
values of optimal) settings for the compiler `out-of-the-box'.

Anyway, this is clearly not true in the Scheme world. Perhaps because
it is much easier to perform correctness-preserving translations in
Scheme while it is still difficult to maintain debuggability Scheme
implementors have weighted the compilers in favor of debugging. Either
way, things are clearly much better than they used to be.

Performance issue #2 was, as Jeffrey Mark Siskind pointed out to
me in his exploration of Stalin's poor performance on this test,
my sloppy usage of the Scheme type system. In fact nearly all systems
had significant performance gains from removing run-time type
checks. Stalin, of course made the biggest gains in this area, running
14 times faster after rationalizing the type usage in the program[3].

Anyway, on to the results. Unfortunately this is not exactly an apples
to apples comparison because the machine I was using even ran the
unmodified code slower today. E.g.

Larceny (previous): 97.30u 0.29s 1:37.73 99.8%
   Larceny (today): 103.20u 0.61s 1:44.45 99.3%

Gremlins, I guess. This means that by tweaking Larceny's options, I
only got another 6% out of it as can be seen below.

So with the best optimizations I have found thus far:

         Bigloo: 28.09u 0.03s 0:28.17 99.8%
        Chicken: 223.43u 1.98s 3:46.14 99.6%
         gambit: 47.24u 1.23s 0:48.83 99.2%
        larceny: 97.68u 0.42s 1:38.30 99.7%
            PLT: 706.91u 0.74s 11:49.24 99.7%
         Stalin: 7.94u 0.08s 0:08.13 98.6%

The winner is: Stalin! by a factor of three over Bigloo.

Many thanks go out to the various peple who suggested optimizations,
but particularly to Jefferey Mark Siskind and Brad Lucier who
corresponded with me at length about Stalin and Gambit, respectively.

I will be re-posting the code, results and build instructions Real
Soon Now, for those of you who think you can improve upon these
results ;)

david rush

[1] Actually, I kept them when I added Gambit-C to the test suite. The
    difference between naive gcc and best gcc (for sparc) was a 30%
    speed up. The reported Gambit performance improves on this by also
    using -D___SINGLE_HOST, which I assume haleps the macro magic in
    gambit.h avoid trampoline calls.
[2] In 18 years professionally, I have only worked  at 2 companies
    that bother with optimization 
[3] Which involved adding SRFI-9 support to S2 so that all the Schemes
    could participate. I haven't yet been able to figure out how to
    use Gambit's native structure and I haven't yet had time to
    incorporate Bigloo's. In both cases I am sceptical about the
    cost/benefits as using the native structures won't eliminate the
    type *checking*, the way that Stalin can.

-- 
Thieves respect property. They merely wish the property to become
their property that they may more perfectly respect it.
	-- The Man Who Was Thursday (G. K. Chesterton)