mhonarc-dev

Re: Suggestions for improving MHA's i18n support

2002-09-13 01:16:49
On September 13, 2002 at 02:55, Mooffie wrote:

No, numeric character references are _independent_ of the encoding of the 
page, because they specify the unicode number of the character.

That is, "þ" is always the the letter Thorn, no matter what the encoding
 
is.

In SGML, this is not true, but for HTML and XML it is.  This is
is deviation from SGML, albeit a reasonable one, so I made a wrong
assumption about HTML.  Also, the character mappings in MHonArc were
created back around '94-'95, before HTML 4.0 existed so character
entity references followed SGML rules.

SGML existed before Unicode, so their was no universal character
encoding.  Hence, numeric character references always refered to
character encoding specified in the SGML declaration.  This is why
named entities were always prefered when authoring SGML documents.

I urge you to read the HSMA source and *.mrc files in order to understand the
 
rationales behind my suggestions. In the code I dealt with the above issues, 
issues that are not at all specific to Hebrew, and which, in my opinion, 
should be handled by MHA itself.

I understand the rationales, and some of the stuff I have thought of
years ago.  You must realize that MHonArc is old (originally done in
Perl 4), so much of the API that you see today is patch work, and not
ideal.  Hence, I have to make decisions on how much effort should be
put into patching the existing code base vs doing a complete rewrite.

As an example, I would like to have the ability for filters, or
"plugins" to be able to hook into the resource system of MHonArc
to define new resources, especially for use in resource files.
Unfortunately, this would require a complete rewrite of resource
management and would require some extra work to migrate older archives
without breaking compatibility.  A redo would definitely help in
adding new resources.  Right now, adding a new resource to MHonArc
requires the modification of multiple source files (e.g. mhopt.pl,
mhinit.pl, mhdb.pl, mhrcfile.pl, and maybe other depending on what
the resource does).

I've even considered designing a new version so it could be implemented
in Perl or Java.  However, it will take some work, and my motivation
is currently lacking to do a complete rewrite.  It's not like I get
paid to develop MHonArc.

As for your some of your recent suggestions, they are good ones.
Some of the are not hard to do, just busy work.  It is highly likely
that the enhancements may be added across multiple releases.

--ewh

---------------------------------------------------------------------
To sign-off this list, send email to majordomo(_at_)mhonarc(_dot_)org with the
message text UNSUBSCRIBE MHONARC-DEV