I snarfed this article back from DejaNews
From: David Rush <kumo@sourceforge.net>
Subject: Relative performance results
Date: 26 Apr 2000 00:00:00 GMT
Message-ID: <w33bt2wciso.fsf@bellsouth.net>
References: <39063507.E0846C15@montana.campuscwix.net>
        <87hfcpwio9.fsf@soggy.deldotd.com>
        <8e6ble$hrs$2@pegasus.csx.cam.ac.uk>
        <6scGOSqY8spCPriWWx0+mZo84OYr@personalnews.de.uu.net>
        <w33zoqhc8e3.fsf@bellsouth.net>
        <sgf53q4tqtj108@corp.supernews.com>
Supersedes: <w33em7sciyn.fsf@bellsouth.net>
Organization: Netscape Communications Corporation
Newsgroups: comp.lang.scheme

"felix" <felix@anu.ie> writes:
> David Rush wrote in message ...
---But did not write the '> >>' attributed material
> >> I did a small benchmark and Guile 1.3.4 is 67 and 22 times slower than
> >> code produced by the Bigloo compiler and the MzScheme compiler
> >> respectively.
> >>
> >> MzScheme interpreter is faster too.
> >MzScheme is *remarkably* fast. In fact, for my application which is
> >pretty heavy on closure creation and call/cc it is *faster* than
> >Bigloo.
> Do you mean MzScheme's interpreter or mzc, the compiler? 

Actually I was using the straight interpreter, which is what really
amazed me. 

#!mzscheme -qnmvde s2.plt "(module-main (vector->list argv))" -  $*
; my code here

No foolin', it was faster than Bigloo. I was wondering if they mmapped
the source file or something in order to read it so quickly.

> Doing some benchmarking I got the following results (Clingers
> modified version of Gabriel's ctak which is hellishly heavy on
> call/cc usage and out old friend, tak): 

<PLT vs Gambit vs Chez results snipped>

I haven't yet been able to run under Gambit or Chez, so I can't say
how they compare under my application. As Lars Hansen pointed out to
me in private email, its pretty hard to say for sure what factors
cause the speedup for the same code between different Schemes, but
Bigloo doesn't claim to be fast for call/cc (in fact it claims
significant performance penalties) and I know that I saw a big
performance degradation (by like a decimal order of magnitude) when I
went to a coroutined reader using call/cc (and tokens as closure
objects). Again, Lars pointed out that call/cc and closure creation
don't have to be slow, so I figured I'd check out his impl to see if
he knew what he was talking about. Larceny proved it to me.

Anyway, my (unscientific) results. The following commands were run to
generate the timings for three of the four impls upon which s2
runs. It is set to the task of bootstrapping itself from its
modularized form into a single r5rs source without macro
expansions. Bigloo is not timed because it has highly destructive
interactions between the Boehm GC and a multitude of call/cc's on
Solaris. The hardware is an UltraSparc 1 with something like 256M
memory.

foreach i (plt larceny s48)
echo for $i substrate:
time s2-$i --search common --search reader `make source` --output-file bench-$i
end

for plt substrate:     38.30u 0.86s 0:39.74 98.5%
for larceny substrate: 13.27u 0.96s 0:18.33 77.6%
for s48 substrate:     78.81u 2.05s 1:23.71 96.5%

PLT was run as shown above. Larceny and Scheme48 were both run by
invoking their VMs on heap images. I find PLT's performance amazing in
this context.

> > And Larceny is even faster.

QED

> The compiler or the interpreter?

This was a heap-dumped image, not native sparc machine code.

> Has anybody real hard information on this implementation? The fact
> that it's only available on SPARC somehow limits it's use.

There you go. I'm pretty pleased with it because it has good r5rs
compliance, too :) The sparc-only aspect *is* a bummer, but since I've
got good portability for my app, I just use Bigloo on Linux/486 (where
the collector doesn't lose under call/cc).

> BTW, I'm working on a big Scheme benchmarking survey

<serious loss of context through snippage>

Benchmarks are useful, but so are real application results. That's why
I've gone into some detail here. That's also why I like the Boyer
benchmark (especially with Baker's mods so you can see how the
performance varies wrt application impl strategies). I decided earlier
today to enable both readers in S2 so that I can (among other things)
compare that performance variance more easily.

Anyway, it's late and I'm really starting to ramble.

david rush
-- 
Who has just watched yet *another* 3-hour C++ build fail for no very
good reason :(