mhonarc-dev

Re: RFC: Japanese Text Conversion and other language issues

2002-12-03 19:27:03
From: Earl Hood <earl(_at_)earlhood(_dot_)com>
Subject: Re: RFC: Japanese Text Conversion and other language issues
Date: Mon, 02 Dec 2002 23:16:37 -0600
(2) Human unreadable (i.e., poor maintainability)
    Imagine if `Hello' written as
    `&#x48;&#x65;&#x6c;&#x6c;&#x6f;'.
    You might say `The files generated by MHonArc don't need
    to be viewed except via web browsers'.
    Nevertheless, it is also true that sometimes I needed to
    see them for maintenance.
...
                                                        Your comment
would also apply if all data is in UTF-8 (unless of course you
have access to a UTF-8-aware editor/viewer).

We have some UTF-8-aware editors/viewers, for example,

Emacs + MULE-UCS: http://www.m17n.org/mule/
This combination is very popular in Japan (MULE stands for
Multilingual Environment).
This supports UTF-8, 16 (of course both LE and BE) and many
encodings:
  http://www.m17n.org/mule/gifs/Sample.gif

JVim/Vim:
Japanized(?) Vim, which supports UTF-8.
It seems recent Vim itself also supports UTF-8.

lv: http://www.ff.iij4u.or.jp/~nrt/lv/
This is less-like file viewer, which supports UTF-8.
It can be used as multilingual grep.


Thus, converting to UTF-8 is more acceptable for me than
converting to entity references.
Still, I'd prefer iso2022.pl as default.


(3) Some softwares cannot read it.
    This is also concerning maintainability.
Yep, but it may be a hit that needs to be taken in order to
solve charset soup.

BTW, can you provide some real-world example software (besides Namazu)?

All of the above softwares.
In addition,

mg (multi-line grep):
This is a kind of grep.
The matching is done across the line boundaries, as its name
shows.
Furthermore, mg can use for Japanese string search (but does
not support UTF-8, much less entity references).


------------------------------------------------------------

I recognized that another advantage to use entity
references: We can use Kanji characters in rc file.
...
Have your tried using the VARREGEX resource to minimize rc file
conflicts?

Oops, I didn't know this.
I'll try later.


However, please wait for a few days because I have a bad
cold now...
Hope you get better,

Thanks.

-- 
Takashi P.KATOH

---------------------------------------------------------------------
To sign-off this list, send email to majordomo(_at_)mhonarc(_dot_)org with the
message text UNSUBSCRIBE MHONARC-DEV

<Prev in Thread] Current Thread [Next in Thread>