Hello, Dan!
1) [PATCH]
Justification: http://www.unicode.org/unicode/faq/utf_bom.html#25
--- ext/Encode-1.30/lib/Encode/Unicode.pm.orig Mon Apr 8 14:06:28 2002
+++ ext/Encode-1.30/lib/Encode/Unicode.pm Mon Apr 8 14:49:24 2002
@@ -12,7 +12,7 @@
sub FBCHAR(){ 0xFFFd }
sub BOM_BE(){ 0xFeFF }
sub BOM16LE(){ 0xFFFe }
-sub BOM32LE(){ 0xFeFF0000 }
+sub BOM32LE(){ 0xFFFe0000 }
sub valid_ucs2($){
if ($_[0] < 0xD800){
@@ -345,7 +345,7 @@
16 32 bits/char
-------------------------
BE 0xFeFF 0x0000FeFF
-LE 0xFFeF 0xFeFF0000
+LE 0xFFeF 0xFFFe0000
-------------------------
=back
@@ -419,5 +419,7 @@
=head1 SEE ALSO
L<Encode>, L<http://www.unicode.org/glossary/>
+
+L<http://www.unicode.org/unicode/faq/utf_bom.html>
=back
2) [PATCH], thanks to Philip Newton
--- E:\anth\tmp\perl\b2\ext\Encode-1.30\lib\Encode\Supported.pod.orig Mon Apr
8 14:06:12 2002
+++ E:\anth\tmp\perl\b2\ext\Encode-1.30\lib\Encode\Supported.pod Mon Apr
8 15:18:34 2002
@@ -592,7 +592,7 @@
JIS has not endorsed the full Microsoft standard however.
The official C<Shift_JIS> includes only JIS X 0201 and JIS X 0208
subsets, while Microsoft has always been meaning C<Shift_JIS> to
-encode a wider character repertoire, see C<IANA> registration for
+encode a wider character repertoire. See C<IANA> registration for
C<Windows-31J>.
As a historical predecessor Microsoft's variant
@@ -600,7 +600,7 @@
that Microsoft shouldn't have used JIS as part of the name
in the first place.
-Unabiguous name: C<CP932>. C<IANA> name (not used?): C<Windows-31J>.
+Unambiguous name: C<CP932>. C<IANA> name (not used?): C<Windows-31J>.
Encode separately supports C<Shift_JIS> and C<cp932>.
3) [QUESTION #1]
Isn't
sub data{
my ($self, $data) = shift;
defined $data and $self->{data} = $data;
return $self;
}
just
sub data{
return shift;
}
? - plz excuse me if I'm trolling
4) [QUESTION #2] [STRATEGY]
I've seen in multiple places that the following is done:
$lc = lc $name
try $name
try $lc
Maybe it would be enough just to do
$lc = lc $name
try $lc
5) [QUESTION #3] [LONG-TERM STRATEGY]
(Not for Perl 5.8 :-)
How do you think, will the strategy
unless (find_in_cache $name){
($acronym = $lc name) =~ tr/- //d;
try $lc
put_into_cache($name, $try_result);
}
work?
This will allow us to do without most of the aliases.
Of course 'KS C 5601' and 'KSC5601', 'GB 2312' and 'GB2312',
'JIS X 0208' and 'JISX0208' will never ever mean different
things then, but I think we will hardly ever want them to mean
different things.
6) [FROLICING AROUND] (while serious people do real work and do
not bother with nonsense :-)
Dan, do you think that
jisx0208-raw is better then jis0208-raw? (and jisx0208.ucm then
jis0208.ucm ?)
If plan 4) is approved then it will mean that anyone can do
find_encoding('JIS X 0208-raw')
(You know, my first reason to subscribe to perl-unicode was
to discuss 'JIS 0208' vs 'JIS X 0208' issue :-)
After all we have ksc5601-raw, not ks5601-raw.
/Anton/