procmail
[Top] [All Lists]

Re: Stripping extra stuff from text

2015-04-30 21:05:25
Greetings, @lbutlr!

MSGTEXT=`/usr/local/bin/formail -I ""`
SMSTEXT=`echo $MSGTEXT | lynx --dump --dont_wrap_pre -stdin | tr '\n' ' ' | 
/usr/bin/cut -c1-140`

I ended up with this, but I don’t especially like it:

SMSTEXT=`echo $MSGTEXT | lynx --dump --dont_wrap_pre -stdin  |sed -e 's/^  
//' |sed -e 's/[-=_]//g'| tr '\n' ' '  | cut -c1-140`

SMSTEXT="$(echo $MSGTEXT | lynx -dump -stdin -assume_charset=UTF-8 | sed -zre 
's/[[:space:]]+/ /g; s/^[[:space:]]|[-=_]//g; s/^(.{,140}).*/\1/')"

I don't see why you are creating 4 pipes instead of one. When you already
using sed, use it right already.

Also, you can't really get 140 bytes (and I assume you want exact bytes, since
you are trying to send SMS) out of unicode string with any certainty
in regard to data integrity using the tools you are using.

Either convert data to UTF-16 (or UCS-2) before cutting 70 character out of
the top, or you need a complete and elaborate script to detect and possible
translate encoding.


-- 
With best regards,
Andrey Repin
Friday, May 1, 2015 04:06:29

Sorry for my terrible english...

____________________________________________________________
procmail mailing list   Procmail homepage: http://www.procmail.org/
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)de
http://mailman.rwth-aachen.de/mailman/listinfo/procmail

<Prev in Thread] Current Thread [Next in Thread>