Re: iso-2022-jp, adding encodings..

2001-06-15 07:55:55
Benjamin Franz <snowhare(_at_)nihongo(_dot_)org> writes:
On Thu, 14 Jun 2001, Edward Peschko wrote:

All I'm trying to do is convert from UTF8 to iso-2022-jp ( the form of shift
jis that is used in email...) any help on how to do this would be greatly 

Don't mix up JIS encoding (=former JUNET-encoding; iso-2022-jp) which
is 7-bit escaped encoding with Shift-JIS (sjis) which uses 8-bits and
no escapes. For email, usually iso-2022-jp (JIS encoding) is used. For
internal processing sane people usually don't use JIS encoding. 

Install 'Unicode::MapUTF8' - it probably does what you want:

my $sjis_string = from_utf8({ -string => $utf8_string, 
                             -charset => 'iso-2022-jp' })

I hope I will never have to maintain such a code. I could spend hours
to find out wether the author intended to use  "sjis" (Shift-JIS) or 
"iso-2022-jp" (JIS) encoding. 

Alternatively, install the 'Jcode' module (Unicode::MapUTF8 forms a
'wrapper' around that and other Unicode modules to provide a single
consistent interface for _all_ Unicode charset convertors).

(ps - the charset that I'm talking about can be found at:

It would be really, really cool if perl had the same charset codes, or at 
an alias to them. That way, one wouldn't have to go through this 'is the 
there' junk. Unfortunately there seems to be 10 aliases for charsets all 
the place.

If Japanese information processing is your main concern I would go
for BTW, last week the SJIS-string module  was released
on CPAN. I don't know how reliable it is, but maybe its worth a try.


<Prev in Thread] Current Thread [Next in Thread>