perl-unicode

Re: questions about encode/decode

2007-10-15 15:30:03
In mailing lists, please write your reply below quotation, and cut
quotation to the minimum required for context. Thanks!


E R skribis 2007-10-15 17:01 (-0500):
As a follow-up, does anyone have any suggestions about optimizing a
routine such as this: sub escapeHTML {

Probably the best optimization is to use the freely available
HTML::Entities module that comes with LWP.

  $x =~ s/&/&amp;/g; $x =~ s/</&lt;/g;

Use a single regex, because every regex has to scan the entire string.
See HTML::Entities for inspiration if you don't want to use the module
(e.g. if you don't want the full spectrum of entities that it supports).

  Encode::encode("iso-8859-1", $x);

It's very probably better to standardize on UTF-8 for your output. Doing
that now saves a lot of trouble when you will need it. And sooner or
later, you will.

Basically I'm concerned about the overhead to constantly look up the
encoder sub for every fragment of HTML I need to escape.

Encode your output once, when outputting. PerlIO layers help to automate
this and save a lot of development time:

    binmode STDOUT, ":encoding(UTF-8)";
    print $foo;  # automatically encoded!
-- 
Met vriendelijke groet,  Kind regards,  Korajn salutojn,

  Juerd Waalboer:  Perl hacker  <#####(_at_)juerd(_dot_)nl>  
<http://juerd.nl/sig>
  Convolution:     ICT solutions and consultancy 
<sales(_at_)convolution(_dot_)nl>

<Prev in Thread] Current Thread [Next in Thread>