perl-i18n

Using :encoding and :crlf together?

2006-02-06 06:13:01
Hi,

I'm new to this list, and I've tried searching the archives but I couldn't find anything like this. I'm using Perl v5.8.7, and I'm currently tearing my hair out trying to get the :encoding and :crlf layers to play nicely with each other.

My problem is that I'm developing a system which, as part of its job, needs to be able to read and write files in most encodings. I'm using :encoding for this - so far, so good.

For readability and compatibility reasons, these files should have CR/LF line endings, although this problem is equally applicable with or without them. So, I figure the :crlf layer works for this.

Unfortunately, trying to get :crlf and :encoding to do the Right Thing with each other seems to be like trying to pull hens' teeth. Here's an example of what I was doing at first:

open(FILE, ">:crlf:encoding(UTF-8)", "some-file.txt");

All seemed to work fine, except until I tested outputting as UTF-16 instead of UTF-8 - at which point I discovered that the encoding layer wasn't encoding the inserted CRs, and thus screwing up the UTF-16 file. D'oh! Okay, so swap the layers:

open(FILE, ">:encoding(UTF-16):crlf", "some-file.txt");

Seems like everything should work there, but now I get problems trying to print some characters. For example, trying to print a \x{A3} (a British pound sign) results in:

"Malformed UTF-8 character (unexpected continuation byte 0xa3, with no preceding start byte) in null operation at ./utf16-test.pl line 6."

...and the output file contains a null character where the sign should be. Strangely, using a literal £ UTF-8 sequence (ie. C2 A3) in the Perl file works fine. Here's the file that generates the above error:

---
#!/usr/bin/perl

open(FILE, ">:encoding(UTF-16):crlf", "test");
print FILE "Test \x{A3}45!\n";
print FILE "Test!\n";
close(FILE);
---

Yes, line 6 is the close() line. Removing :crlf from the layers fixes the problem, so I'm wondering if this is a bug in the implementation of :crlf. I'd really like to have some sort of transparent CR/LF conversion though, as it makes things a lot easier.

Is this a known problem?

 - Ciaran.

<Prev in Thread] Current Thread [Next in Thread>