nmh-workers
[Top] [All Lists]

Re: Bug reported regarding Unicode handling in email address

2021-06-07 08:55:14
Hi Ken,

It's early morning for me, and I'm still at least a liter of Diet
Mountain Dew away from being sufficiently caffeinated to be
positive, but that looks like "not totally correct, but a lot closer
than what we have now".

In particular, that will accept overlong and illegal utf-8
codepoints, and probably misbehaves in strange and unusual
non-ascii/non-utf-8 things like iso2022-jp.

So, the DETAILS are complicated.

This is nmh.  :-)

The address parser code is used for a lot of things.  The specific bug
report was about a draft message that contained Cyrillic characters.
We know what that character set was in THAT case, because it's a draft
message and we can derive the locale from the environment or the nmh
locale setting.  But if we are processing an email message then we
don't easily know the character set.  In theory it should either be
us-ascii or utf-8, but reality sometimes intrudes and it could be
anything.

If it's an email then won't it be ASCII?

I think really instead of using ctype macros, we should be using a
specific set of macros tailored for email addresses.

Isn't the problem that one routine is being used to parse emails which
should comply with the RFCs and also draft emails where it's up to nmh
to decide the allowable format?  We should be parsing ASCII-encoded
fields for display in the user's locale with one routine and
locale-encoded fields for transmission as ASCII with a second routine.

-- 
Cheers, Ralph.

<Prev in Thread] Current Thread [Next in Thread>