making utf8-clean CPAN distributions

I'm looking for help from you guys on an important forward-looking question.

What I would like to do is create my CPAN module distributions suchthat all of the files in each distro, code and documentation andtests and logs alike, are properly UTF-8 encoded, and do this in sucha way that no modern Perl distributions or the automated CPAN toolswill break. Note that my modules in question are so bleeding edgethat I don't expect to have any legacy users yet to worry aboutbreaking compatability with. It's a given that I'm using UTF-8,which is supposed to be network safe and only have only a singlebyte-order; I don't plan to touch UTF-16 or any other variations.

For tools, I have on my machine Mac OS X 10.3.6 (operating system),and BBEdit 8.0.3 (text editor), both of which are fully Unicodeaware. I'm also using the latest Perl, version 5.8.6, compiledmyself with the OS-bundled dev tools such as GCC 3.3 etcetera. I ammaking the distributions formally require 5.008.

0. For my main question, is distribution as Unicode files a good ideaat all currently, though few if any people do it?


The following questions only apply if the answer to the above is "yes".

1. BBEdit gives me an option to have a byte-order mark in UTF-8 files(that happens to be 3 octets long I think), with the recommendationbeing to use it; I also have the choice not to, which makes the filemore similar to many other ASCII-like encodings. So should I savethe files with the BOM or without?

2. I am given a separate option to use either Unicode linebreaks orone of Unix/Mac/Win; all 4 are given as options to use with a Unicodeencoding. In my own tests, Perl 5.8 complained when the Unicode linebreak was used with UTF8, but not the Unix line break (I was not,however, using any special pragmas). So should I use the Unicodelinebreak or the Unix linebreak, assuming the former can be made towork?

2.1 Will the addition of "use utf8" on the first line of a Perl filecause Perl to accept files with Unicode line breaks?

3. Can a "use utf8" be put anywhere besides the first line of a file?What if I customarily put POD on the first few lines and the packagedeclaration beneath it? Also, in a script file, which goes first,the #!perl or the use-utf8?

4. What about plain POD files? Since they contain no POD, will PODextractors know what to do since I can't put the use-utf8 in them?

5. Would the CPAN compare utility adapt to encoding changes, or wouldit consider an otherwise-identical file with different encodings toconsist of one very large change?


6. In general, would anything on CPAN break?  What about the automated testers?

7. Are there any other common issues that I should be aware of, andif so then what?

AFAIK, Perl 6 is going to expect its code files to be Unicode bydefault? I know Larry said that some Unicode characters would beused by the language grammar.


Thanks for any input you can give.

-- Darren Duncan