procmail
[Top] [All Lists]

Re: Garbage vs Valid

2003-02-02 12:10:34
At 12:30 2003-02-02 +0100, Ruud H.G. van Tol did say:
Dallman Ross skribis:

> Let's check for those with nine consonants in a row:
>
>  11:34pm [~/Mail] 759[0]> egrep '[bcdfghjklmnpqrstvwxyz]{9}'
> /usr/share/dict/words
> Amblyrhynchus
> glycyphyllin
> Oxyrrhyncha
> oxyrrhynchid
> pachyrhynchous

All those words are more vowelly than you assume.

As was pointed out right from my first post "AEIOU, and sometimes Y", should mean that y is always treated as a vowel for the purposes of a consonant-run test. I recall that the post that you're quoting Dallman from also specifically mentioned exluding y and "ph" as well.

Including "ph" into an expression would be easier than excluding it (although not a vowel, I'm including it in the following variable to demonstrate the syntax of the expression:

VOWEL=([aeiouy]|ph)

Thus, it would be easy enough to check for vowels. However, checking for the consonants as a character class not including ph is a bit more complicated. I welcome seeing someone else scribe that one at the moment, as I'm a bit tied up in making sure I'm available to someone for a server relocation project.

Also, concatendated words pose a peculiar problem, and are exceedingly common in computer and internet use (the citation of "earthlink" is an example), which poses a particular hurdle for my original suggestion of possibly looking for runs of three or more consonants. I suspect that "th" and "rh" should also probably be added to the "exclude me as a consonant" tests.

---
 Sean B. Straw / Professional Software Engineering

 Procmail disclaimer: <http://www.professional.org/procmail/disclaimer.html>
 Please DO NOT carbon me on list replies.  I'll get my copy from the list.


_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail

<Prev in Thread] Current Thread [Next in Thread>