perl-unicode

[PATCH]s and questions [Encode] 1.30

2002-04-08 04:27:58
Hello, Dan!

1) [PATCH]
   Justification: http://www.unicode.org/unicode/faq/utf_bom.html#25

--- ext/Encode-1.30/lib/Encode/Unicode.pm.orig  Mon Apr  8 14:06:28 2002
+++ ext/Encode-1.30/lib/Encode/Unicode.pm       Mon Apr  8 14:49:24 2002
@@ -12,7 +12,7 @@
 sub FBCHAR(){ 0xFFFd }
 sub BOM_BE(){ 0xFeFF }
 sub BOM16LE(){ 0xFFFe }
-sub BOM32LE(){ 0xFeFF0000 }
+sub BOM32LE(){ 0xFFFe0000 }
 
 sub valid_ucs2($){
     if ($_[0] < 0xD800){
@@ -345,7 +345,7 @@
             16         32 bits/char
 -------------------------
 BE     0xFeFF 0x0000FeFF
-LE      0xFFeF 0xFeFF0000
+LE      0xFFeF 0xFFFe0000
 -------------------------
 
 =back
@@ -419,5 +419,7 @@
 =head1 SEE ALSO
 
 L<Encode>, L<http://www.unicode.org/glossary/>
+
+L<http://www.unicode.org/unicode/faq/utf_bom.html>
 
 =back

2) [PATCH], thanks to Philip Newton

--- E:\anth\tmp\perl\b2\ext\Encode-1.30\lib\Encode\Supported.pod.orig   Mon Apr 
 8 14:06:12 2002
+++ E:\anth\tmp\perl\b2\ext\Encode-1.30\lib\Encode\Supported.pod        Mon Apr 
 8 15:18:34 2002
@@ -592,7 +592,7 @@
 JIS has not endorsed the full Microsoft standard however.
 The official C<Shift_JIS> includes only JIS X 0201 and JIS X 0208
 subsets, while Microsoft has always been meaning C<Shift_JIS> to
-encode a wider character repertoire, see C<IANA> registration for
+encode a wider character repertoire. See C<IANA> registration for
 C<Windows-31J>.
 
 As a historical predecessor Microsoft's variant
@@ -600,7 +600,7 @@
 that Microsoft shouldn't have used JIS as part of the name
 in the first place.
 
-Unabiguous name: C<CP932>. C<IANA> name (not used?): C<Windows-31J>.
+Unambiguous name: C<CP932>. C<IANA> name (not used?): C<Windows-31J>.
 
 Encode separately supports C<Shift_JIS> and C<cp932>.
 




3) [QUESTION #1]
Isn't

sub data{
    my ($self, $data) = shift;
    defined $data and $self->{data} = $data;
    return $self;
}

just

sub data{
    return shift;
}
? - plz excuse me if I'm trolling


4) [QUESTION #2] [STRATEGY]
I've seen in multiple places that the following is done:

$lc = lc $name

try $name
try $lc

Maybe it would be enough just to do

$lc = lc $name

try $lc

5) [QUESTION #3] [LONG-TERM STRATEGY]

(Not for Perl 5.8 :-)

How do you think, will the strategy

unless (find_in_cache $name){

    ($acronym = $lc name) =~ tr/- //d;

    try $lc

    put_into_cache($name, $try_result);
}

work?

This will allow us to do without most of the aliases.
Of course 'KS C 5601' and 'KSC5601', 'GB 2312' and 'GB2312',
'JIS X 0208' and 'JISX0208' will never ever mean different
things then, but I think we will hardly ever want them to mean
different things.

6) [FROLICING AROUND] (while serious people do real work and do
                       not bother with nonsense :-)
Dan, do you think that
jisx0208-raw is better then jis0208-raw? (and jisx0208.ucm then
                                              jis0208.ucm ?)
If plan 4) is approved then it will mean that anyone can do
find_encoding('JIS X 0208-raw')

(You know, my first reason to subscribe to perl-unicode was
to discuss 'JIS 0208' vs 'JIS X 0208' issue :-)

After all we have ksc5601-raw, not ks5601-raw.

/Anton/