In my experience developing an optimizing XSLT compiler, things like tail
recursion and the obvious optimizations are not the biggest performance
killers.  Instead, it is the fact that language requires you to express
many concepts as expensive loops that iterate over far more nodes than
are necessary.  
This is not a criticism of the language; I enjoy XSLT, and I know the benefits
of a functional design, but it is a fact that most of the art of truly
optimizing XSLT involves divining what the user wants you to do and doing
it differently from how it was expressed in the .xsl file.
Also, you should keep in mind that C/C++ compilers have been around for a long
time and all those optimizations that you take for granted, such as constant
folding, loop strength reduction, etc, were by no means universal when the art
was young.
XSLT interpreters and compilers are relatively young, and I'm sure that as the
language matures, a standard set of opimizations will grow and be more
reliable.  We don't know yet what the best optimizations for XSLT are; things
like tail recursion are a good idea, but they are not the holy grail of XSLT
optimization.  
Furthermore, optimizations like memoization are not always wins; it may be 
good for your test case, but other tests will show slow downs 
and unnecessary memory use.  
<shameless-plug>If you are really interested in top notch XSLT performance, you
should check out DataPower's XA35/XS40.  It is powered by an XSLT JIT compiler
specifically intended to provide excellent performance.  If performance is
really important to you, it's worth a look.  We also have a built-in XSLT
profiler, support for exslt extensions, etc.</shameless-plug>
my 2 cents,
Niko
 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list