nmh-workers
[Top] [All Lists]

Re: Bug reported regarding Unicode handling in email address

2021-06-07 10:09:23
The address parser code is used for a lot of things.  The specific bug
report was about a draft message that contained Cyrillic characters.
We know what that character set was in THAT case, because it's a draft
message and we can derive the locale from the environment or the nmh
locale setting.  But if we are processing an email message then we
don't easily know the character set.  In theory it should either be
us-ascii or utf-8, but reality sometimes intrudes and it could be
anything.

If it's an email then won't it be ASCII?

Boy, you're out of the loop!  Check out RFC 6532.

I think really instead of using ctype macros, we should be using a
specific set of macros tailored for email addresses.

Isn't the problem that one routine is being used to parse emails which
should comply with the RFCs and also draft emails where it's up to nmh
to decide the allowable format?  We should be parsing ASCII-encoded
fields for display in the user's locale with one routine and
locale-encoded fields for transmission as ASCII with a second routine.

I mean ... yes?  Like many things there's a lot of overloading (see:
using email header parsing routines for config files).  But I think
in practice as long as we don't interpret non-ASCII bytes as "spaces"
we'll be fine.  Like I said, really, for parsing an email header we really
shouldn't be using ctype macros AT ALL but email-specific macros.

--Ken

<Prev in Thread] Current Thread [Next in Thread>