IIRC some time back the recommendation used to be 10x Mike?
If that's correct, what's changed please? Just Saxon getting smarter?
I think I used to say 10x before the TinyTree came along, but that's a
very long time ago. Since the introduction of the TinyTree any
improvements have been relatively minor (e.g. whitespace compression).
4x is probably the best you'll achieve, but I've seen a number of people
report that. A more detailed sizing (assuming no attribute nodes, no
type information, no backwards navigation, and no keys) is:
19 bytes per element node
19 bytes for a whitespace text node
19 + 2x bytes for a non-whitespace text node, where x is the number of
characters
It's not unusual to see documents where most of the lines are say 40
characters long, and account for one element, one whitespace text node,
and one 20-byte text node, which means 40 bytes of source translates to
97 bytes of TinyTree space, giving an expansion factor of 2.5.
In my IEEE Data Engineering paper a couple of years ago at
http://sites.computer.org/debull/A08dec/saxonica.pdf , I measured the
memory occupied by the 100Mbyte XMark test document at 327Mbytes, and
this agreed well with the theoretical sizing.
Michael Kay
Saxonica
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--