mhonarc-dev

Re: [bug #14747] major (10X) memory savings possible in some situations

2006-07-19 13:09:03
On July 19, 2006 at 18:44, Andrew Shirrayev wrote:

one BIG letter
$ ls -l
-rw-r--r--  1 andrews andrews 29197818 Jul 19 18:42 mbox.200410.one
$ wc mbox.200410.one
 1199719  2958850 29197818 mbox.200410.one

1st way:

<TextEncode>
utf-8; MHonArc::UTF8::to_utf8; MHonArc/UTF8.pm
</TextEncode>

Did you also set:

  <-- With data translated to UTF-8, it simplifies CHARSETCONVERTERS -->
  <CharsetConverters override>
  default; mhonarc::htmlize
  </CharsetConverters>

  <-- Need to also register UTF-8-aware text clipping function -->
  <TextClipFunc>
  MHonArc::UTF8::clip; MHonArc/UTF8.pm
  </TextClipFunc>

If you use TEXTENCODE, you can avoid dealing with MHonArc::CharEnt
with the above CHARSETCONVERTERS.  Without the above, MHonArc will
convert all non-ASCII UTF-8 sequences into entity references.

In general, if you use TEXTENCODE, you should also redefine
CHARSETCONVERTERS appropriately.

--ewh

---------------------------------------------------------------------
To sign-off this list, send email to majordomo(_at_)mhonarc(_dot_)org with the
message text UNSUBSCRIBE MHONARC-DEV

<Prev in Thread] Current Thread [Next in Thread>