Re: Bug reported regarding Unicode handling in email address

It's early morning for me, and I'm still at least a liter of Diet Mountain Dew
away from being sufficiently caffeinated to be positive, but that looks like
"not totally correct, but a lot closer than what we have now".

In particular, that will accept overlong and illegal utf-8 codepoints, and
probably misbehaves in strange and unusual non-ascii/non-utf-8 things
like iso2022-jp.


So, the DETAILS are complicated.

The address parser code is used for a lot of things.  The specific bug
report was about a draft message that contained Cyrillic characters.
We know what that character set was in THAT case, because it's a draft
message and we can derive the locale from the environment or the nmh
locale setting.  But if we are processing an email message then we don't
easily know the character set.  In theory it should either be us-ascii
or utf-8, but reality sometimes intrudes and it could be anything.

I think really instead of using ctype macros, we should be using a
specific set of macros tailored for email addresses.  Or a flex
lexer designed to process those things.  I kind of think that we
should simply pass the input along as we are given rather than trying
to validate that it is valid UTF-8 (for example).  iso2022-jp is
SO complicated, I don't think we should even try and I get the sense
everyone is migrating to UTF-8 for email anyway.

--Ken

<Prev in Thread]	Current Thread	[Next in Thread>
Bug reported regarding Unicode handling in email address, Ken Hornstein Re: Bug reported regarding Unicode handling in email address, Tom Lane Re: Bug reported regarding Unicode handling in email address, Ken Hornstein Re: Bug reported regarding Unicode handling in email address, David Levine Re: Bug reported regarding Unicode handling in email address, Tom Lane Re: Bug reported regarding Unicode handling in email address, Ken Hornstein Re: Bug reported regarding Unicode handling in email address, Ralph Corderoy Re: Bug reported regarding Unicode handling in email address, Tom Lane Re: Bug reported regarding Unicode handling in email address, Valdis Klētnieks Re: Bug reported regarding Unicode handling in email address, Ken Hornstein <= Re: Bug reported regarding Unicode handling in email address, Bob Carragher Re: Bug reported regarding Unicode handling in email address, Ralph Corderoy Re: Bug reported regarding Unicode handling in email address, Ken Hornstein Re: Bug reported regarding Unicode handling in email address, Ralph Corderoy Re: Bug reported regarding Unicode handling in email address, Ken Hornstein Re: Bug reported regarding Unicode handling in email address, Ralph Corderoy Re: Bug reported regarding Unicode handling in email address, Ken Hornstein Re: Bug reported regarding Unicode handling in email address, Ralph Corderoy Re: Bug reported regarding Unicode handling in email address, Robert Elz Re: Bug reported regarding Unicode handling in email address, Ken Hornstein

<Prev in Thread]

Current Thread

[Next in Thread>

Previous by Date:	Re: Bug reported regarding Unicode handling in email address, Valdis Klētnieks
Next by Date:	Re: Bug reported regarding Unicode handling in email address, Bob Carragher
Previous by Thread:	Re: Bug reported regarding Unicode handling in email address, Valdis Klētnieks
Next by Thread:	Re: Bug reported regarding Unicode handling in email address, Bob Carragher
Indexes:	[Date] [Thread] [Top] [All Lists]

Previous by Date:

Re: Bug reported regarding Unicode handling in email address, Valdis Klētnieks

Next by Date:

Re: Bug reported regarding Unicode handling in email address, Bob Carragher

Previous by Thread:

Re: Bug reported regarding Unicode handling in email address, Valdis Klētnieks

Next by Thread:

Re: Bug reported regarding Unicode handling in email address, Bob Carragher

Indexes:

[Date] [Thread] [Top] [All Lists]