ietf
[Top] [All Lists]

Re: [RFC 959] FTP in ASCII mode

2006-02-21 06:12:05
Sandeep Srivastava wrote:
Thanks John. Please see my response in-line...

On 2/21/06, *John C Klensin* <john-ietf(_at_)jck(_dot_)com <mailto:john-ietf(_at_)jck(_dot_)com>> wrote:



    --On Tuesday, 21 February, 2006 12:53 +0530 Sandeep Srivastava
    < sandeep(_dot_)kumar(_dot_)srivastava(_at_)gmail(_dot_)com
    <mailto:sandeep(_dot_)kumar(_dot_)srivastava(_at_)gmail(_dot_)com>> wrote:

     > First of all thanks to everybody for the response.
     >
     > I knew that a FTP transfer in ASCII mode does EOL and EOF
     > conversions based on the OS of the receiving system.

    No, it doesn't.  That was part of the point.  It does no EOF
    conversions at all.   The command and data channels were
    separated for several reasons, but the desire to stay out of the
EOF business was an important one.

Right. I understand the command/data channel part -- i.e. instead of sending the EOF as a data, it is sent as a command, and the receiver can then use the OS specific EOF. The overall affect to the end user is as if both EOL and EOF are converted to the receiving OS defaults.


    And the server is required
    to convert whatever line-end convention it uses to CRLF, and any
    characters it uses to ASCII, and transmit that over the wire.


I don't understand this point very well. Does it mean that as per the FTP RFC the server reads 8-bits at a time, and sets the most significant bit to zero (because ASCII is 7-bits) before transmitting it in ASCII mode? If this is the case, then how did a UTF-8 encoded file containing Devanagri characters (i.e. characters greater than 7F) got FTPied over (and back) correctly in ASCII mode.

If not, -- i.e. it does not sets the msb to zero, then how does ASCII mode differs from Binary mode?

Scenario:
I am using WS-FTP pro as the client on my windows 2000 machine, to FTP to and back from a Solaris box (acting as FTP server).

Thanks,
Sandeep.


Sending ASCII from DOS/Windows means nothing to do for DOS, replace
CR/LF by LF only before storing on Solaris.

Receiving ASCII on DOS/Windows means for the Solaris box to replace
the lone LF by CR/LF before sending. If both DOS/Windows and Solaris
are lazy enough not to mask the 0x80 bit, then you are lucky.

Problems could arise when you have character sequences including
printable CRs or LFs. You will only find them when you least
suspect them :)


I remember having sent umlauts (ae=ä) (oe=ö) (ue=ü) between DOS and
Windoes. They do use different character sets. The result looked
nasty. If Solaris and Windows use the same character set, the same
printer? you could be lucky again.


Peter


    If the client then converts from CRLF and ASCII to some local
    convention, that is its business, not that of the protocol.  In
    other words, there are, at most, conversions to and from CRLF
    and ASCII. There are no FTP-specified conversions based on the
    properties of the receiving system.

     > And I
     > very much expected my UTF-8 encoded file to get garbled when I
     > FTPied it in ASCII mode. But guess what, it was not garbled on
     > the receiving system. Maybe I was lucky, or maybe its because
     > UTF-8 is backward compatible with ASCII. But then, as ASCII is
     > purely 7-bits, the FTP in ASCII mode should have corrupted the
     > UTF-8 encoded file, because UTF-8 is 8-bits.

    "Should have corrupted" is what I referred to as an ambiguity in
    my note.   First of all, because of the robustness principle,
    you can never guarantee that bad things will happen when they
    might -- proper implementation of protocols around her often
    argues for never trashing a string because one can or because a
    correct string wouldn't have the problem.

    So, in practice, if an FTP server was implemented on an ASCII
    system that used the "right justified in octets" model but with
    LF as line-end, the authors might have well said "the character
    codes don't need any conversion for ASCII mode, we just need to
    implement conversion to CRLF".  If they had done that, and UTF-8
    (or ISO 8859 Latin-1 or...) were added to the system, those CCSs
    would go through nicely in ASCII mode, with the right
    line-endings.  Substantially the same thing would occur, as
    Ohta-san points out, with many of the ISO 2022-based encodings
    of non-ASCII characters: completely safely with some of them and
    at least as safely as UTF-8 with the others although, as with
    UTF-8, the claim of strict ASCII would be technically false.
    Now that wouldn't happen with a system that was natively EBCDIC,
    or ASCII stored in seven bit chunks without padding, etc.: those
    systems would need to do real conversions to get to network
    ASCII and, if you thought you were getting UTF-8 over them, you
    would be in big trouble.

     > Moreover, in ASCII code page, code point 13=CR and code point
     > 10=LF, but that might not be the case in every other code
     > page. Hence the EOL conversion (in FTP ASCII mode) might
     > corrupt that text file if it is encoded using a non-ASCII
     > encoding. And what about handling the Unicode NewLine
     > characters? Anyway...

    Again, there is no conversion in the FTP protocol to local
    character set, only to (and, outside the protocol but common in
    client implementations) conversation to network ASCII with its
    CRLF line endings.

     > After reading all the wonderful replies, my conclusion is,
     > even though my FTP client/server handled the UTF-8 encoded
     > text file (which BTW contained Devanagri characters)
     > correctly, there is a possibility that a text file, encoded in
     > an encoding other than ASCII runs a risk of being corrupted
     > when FTPied in ASCII mode. Therefore, always use ASCII mode to
     > transfer only ASCII encoded files, and Binary mode to transfer
     > non-ASCII encoded files.

    Yes, that is probably wise guidance.  However, if you transfer
    textual materials in binary (Image) mode, you also need to be
    sure that you have programs available on the receiving host to
    change line-end conventions from whatever the server uses
    internally to whatever the client system uses.

     > I was wondering why isn't there something like a "Text" mode
     > for FTPing text files, which could handle text files encoded
     > using any encoding available in this world, and then, the FTP
     > client/server still does the EOL and EOF conversions properly?

    For starters, because it would require that every FTP server
    support at least the several thousand coded character sets in
    the world.  Even for end of line, there are significantly more
    different conventions than you seem to think there are.
    "Convert from whatever we use as text here to a single standard
    form, and then let the recipient sort out conversion from the
    standard form to its preferred local form" is much more
    plausible -- it requires the server to support one type of
    conversion, not thousands, and the client to support one type of
    conversion, not thousands.  In the early 1970s, the appropriate
    standard form for transmission was network ASCII (including both
    "right justified in eight bits" and CRLF).  Today, it is
    probably UTF-8 with CRLF (although I sympathize with Ohta-san's
    desire to be able to transmit 2022-based systems in canonical
    form) and I think we should be considering that TYPE.  But ideas
    about universal converters make both bad protocol design and bad
    implementations.

        john




------------------------------------------------------------------------

_______________________________________________
Ietf mailing list
Ietf(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/ietf


--
Peter and Karin Dambier
The Public-Root Consortium
Graeffstrasse 14
D-64646 Heppenheim
+49(6252)671-788 (Telekom)
+49(179)108-3978 (O2 Genion)
+49(6252)750-308 (VoIP: sipgate.de)
mail: peter(_at_)peter-dambier(_dot_)de
mail: peter(_at_)echnaton(_dot_)serveftp(_dot_)com
http://iason.site.voila.fr/
https://sourceforge.net/projects/iason/


_______________________________________________
Ietf mailing list
Ietf(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/ietf

<Prev in Thread] Current Thread [Next in Thread>