Re: RFC: Japanese Text Conversion and other language issues

2002-12-03 19:27:03
From: Earl Hood <earl(_at_)earlhood(_dot_)com>
Subject: Re: RFC: Japanese Text Conversion and other language issues
Date: Mon, 02 Dec 2002 23:16:37 -0600
(2) Human unreadable (i.e., poor maintainability)
    Imagine if `Hello' written as
    You might say `The files generated by MHonArc don't need
    to be viewed except via web browsers'.
    Nevertheless, it is also true that sometimes I needed to
    see them for maintenance.
                                                        Your comment
would also apply if all data is in UTF-8 (unless of course you
have access to a UTF-8-aware editor/viewer).

We have some UTF-8-aware editors/viewers, for example,

Emacs + MULE-UCS:
This combination is very popular in Japan (MULE stands for
Multilingual Environment).
This supports UTF-8, 16 (of course both LE and BE) and many

Japanized(?) Vim, which supports UTF-8.
It seems recent Vim itself also supports UTF-8.

This is less-like file viewer, which supports UTF-8.
It can be used as multilingual grep.

Thus, converting to UTF-8 is more acceptable for me than
converting to entity references.
Still, I'd prefer as default.

(3) Some softwares cannot read it.
    This is also concerning maintainability.
Yep, but it may be a hit that needs to be taken in order to
solve charset soup.

BTW, can you provide some real-world example software (besides Namazu)?

All of the above softwares.
In addition,

mg (multi-line grep):
This is a kind of grep.
The matching is done across the line boundaries, as its name
Furthermore, mg can use for Japanese string search (but does
not support UTF-8, much less entity references).


I recognized that another advantage to use entity
references: We can use Kanji characters in rc file.
Have your tried using the VARREGEX resource to minimize rc file

Oops, I didn't know this.
I'll try later.

However, please wait for a few days because I have a bad
cold now...
Hope you get better,


Takashi P.KATOH

