Robert Elz writes:
A generalisation yes, but an incorrect one - single quotes are not
special in 822, and don't want to be, things like O'Toole are perfectly
valid names for people to have. What's more, a name field may have
two people with ' characters in their names (O'Toole & O'Sullivan)
which shouldn't turn into OToole & OSullivan) Only " quotes strings,
' is just a character.
Thanks for your comments.
What you say is completely correct according to the RFC. But look at
the following headers from a message in my inbox:
To: "'C Mummert'" <mummert(_at_)math(_dot_)psu(_dot_)edu>
X-Mailer: Microsoft Outlook, Build 10.0.4510
Messages like that are why I had added single quotes to the switch.
I have a half suspicion that an (incorrect, but still possible) lone \
at the end of the string might break your code too (just from reading it).
This needs to be fixed. I was applying unquote to address components,
and I have convinced myself that getadrx() will never leave a
trailing \ in a string. But someone could mistakenly apply the unquote
function to a string that isn't rfc-2822 encoded (like the subject
field). I hadn't thought of that.
The algorithm could probably also be improved, there's no need to find
and remove the closing " in a string, and then go back and process it
all again - just copy to the output buffer while seeking the ". Quoted
strings don't nest (and with just one quoting character, they cannot),
there's no need to allow for that possibility.
Headers like the example above are my motivation for handling nested
quotes correctly. Once nested quotes are ruled out, the method you
describe is faster, and simpler to implement.
If anyone is actually interested in using my code for unquoting strings,
I am willing to implement the changes Robert Elz suggests. This would
make the parser implement exactly the RFC spec.
Nmh-workers mailing list