nmh-workers
[Top] [All Lists]

Re: [Nmh-workers] General question - unsupported charset conversion

2014-02-28 17:53:59
Ken Hornstein writes:

Unfortunately, I have a lot of experience and troubles with character
set conversion. 

Well, if you just bit the bullet and switched to UTF-8, you wouldn't have
all of these problems! :-)

It is not that simple. Utf-8 solves couple of problems but creates some
new .... =:-) Advantages and disadvantages of utf-8 is a very wide
topic.


In practice it means a spam in exotic language and at this point I know
that I do not want to read such a message. 

I can see that, but I'm not sure that's an appropriate choice for all
cases (like, for instance, MIME parameters).

That is right. On the other hand, you never prevent malformed MIME
parameters.


This is very frequent and causes a lot of troubles. Entire message in
English and one foreign family name in original. Message send in utf-8
but (suppose) my terminal support only ASCII. Converison would fail. 

Errr ... really?  In the case I'm thinking, the one foreign family
name would have the offending character output as a '?' (or whatever).
The conversion would go through fine.

Well, the meaning of word "fail". Formally it is not possible to
convert any utf-8 character to 256 characters in iso/cp/... 8bit set. 
Converison would fail.

Ignoring absent symbols or substituting them by something else causes
that the conversion would go through fine.

Ignoring symbols or substituting them by '?' causes that conversion is
non-reversible and the result may be difficult to read. 

It is not a problem in case of one or two missing or substituted
symbols in long text. We can guess what is the me?ning of the word.
For many non-convertible symbols reading of such a text is more
similar to solving a crossword puzzle. What could be '??o??w??d'
 
In my personal opinion a very good choice is conversion into
html-entities, like ą or ł . It remains quite readable and
is still unique enough to convert it back in case of need.

Um, ouch.  Unless there's a common library that already implements
that behavior, that's not on the table at all.

This is a serious argument. However, mentioned Recode library has
something like that: 
ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MISC/SGML.TXT

I do not know is it useful or not.

        max


_______________________________________________
Nmh-workers mailing list
Nmh-workers(_at_)nongnu(_dot_)org
https://lists.nongnu.org/mailman/listinfo/nmh-workers

<Prev in Thread] Current Thread [Next in Thread>