procmail
[Top] [All Lists]

Re: Embedded comments

2003-05-24 11:05:51
Maybe it should be * -1^1 ()\< to stop it thinking I am quoting the <.

Yes, Alan, it should.

However, \< is not actually a word boundary. It's a character (including a newline) that wouldn't be in a word. It will count periods, apostrophes, and who knows what. If I end a sentence with a question mark and a quotation mark, leave two spaces, and then start the next sentence, that will be four matches to \< for the start of only one new word. If you really want to count starts of words, you'll need something like this (given that you want to subtract 1 for each):

 * -1^1 ()\<[a-z0-9]
 *  1^1 [a-z0-9]'[a-z]

so that contractions, possessives, and forms like "p's and q's" don't get oversubtracted because of their apostrophes. Then I suppose we have decimals to consider:

 *  1^1 [0-9]\.[0-9]


_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail

<Prev in Thread] Current Thread [Next in Thread>