ietf
[Top] [All Lists]

Re: [RFC 959] FTP in ASCII mode

2006-02-21 00:33:33
First of all thanks to everybody for the response.

I knew that a FTP transfer in ASCII mode does EOL and EOF conversions based
on the OS of the receiving system. And I very much expected my UTF-8 encoded
file to get garbled when I FTPied it in ASCII mode. But guess what, it was
not garbled on the receiving system. Maybe I was lucky, or maybe its because
UTF-8 is backward compatible with ASCII. But then, as ASCII is purely
7-bits, the FTP in ASCII mode should have corrupted the UTF-8 encoded file,
because UTF-8 is 8-bits.

Moreover, in ASCII code page, code point 13=CR and code point 10=LF, but
that might not be the case in every other code page. Hence the EOL
conversion (in FTP ASCII mode) might corrupt that text file if it is encoded
using a non-ASCII encoding. And what about handling the Unicode NewLine
characters? Anyway...

After reading all the wonderful replies, my conclusion is, even though my
FTP client/server handled the UTF-8 encoded text file (which BTW contained
Devanagri characters) correctly, there is a possibility that a text file,
encoded in an encoding other than ASCII runs a risk of being corrupted when
FTPied in ASCII mode. Therefore, always use ASCII mode to transfer only
ASCII encoded files, and Binary mode to transfer non-ASCII encoded files.

I was wondering why isn't there something like a "Text" mode for FTPing text
files, which could handle text files encoded using any encoding available in
this world, and then, the FTP client/server still does the EOL and EOF
conversions properly?

Thanks,
Sandeep.


On 2/21/06, Masataka Ohta 
<mohta(_at_)necom830(_dot_)hpcl(_dot_)titech(_dot_)ac(_dot_)jp> wrote:

John C Klensin wrote:

Sandeep's question raises another interesting issue.  I just
went back and reread RFC 2640.   It does not seem to address the
"TYPE A" issue at all.  It does say (Section 2, paragraph 1)
"Clients and servers are, however, under no obligation to
perform any conversion on the contents of a file for operations
such as STOR or RETR", which I would take to imply that it
anticipates I18N FTP operations to be entirely binary ("TYPE I")
although that is not explicit.

As for Japanese processing, UTF-8 is not visible by users and on
the network, because UTF-8 is not only useless but also harmful.

Instead, ISO-2022-JP, ShiftJIS and EUC are the major character sets.
Some ftp implementations does assume (sometimes depending on environment
variables) network character code ShiftJIS or EUC and perform appropriate
conversions, which garbles UTF-8.

On the other hand, if you use ISO-2022-JP, which is 7 bit pure and ASCII
compatible (in a sense, it is pure ASCII), we can safely use ASCII mode
of vanilla ftp and there is no confusion as long as we are in ASCII
environment.

Similar encoding can be profiled using ISO 2022 to obtain a fully
internationalized, 7 bit pure, ASII compatible character encoding.

The only problem for RFC2460 was that it does not need MIME for
charset and 8bit extension that it makes it clear that MIME is
useless.

Note that long term state maintainance of full ISO 2022 is not
more complex than that of UTF-8. Note also that, carefully profiled
ISO 2022, such as ISO-2022-JP, requires state maintainance a lot
simpler than that of UTF-8.

Whether the characters in use are UTF-8 or not, we've still got
that issue with line-endings.

Line-ending issues of any ISO 2022 based encoding are just as simple
as those of ASCII.

                                                        Masataka Ohta



_______________________________________________
Ietf mailing list
Ietf(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/ietf

_______________________________________________
Ietf mailing list
Ietf(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/ietf
<Prev in Thread] Current Thread [Next in Thread>