ietf-xml-mime
[Top] [All Lists]

Re: Some text that may be useful for the update of RFC 2376

2000-03-23 02:21:13
In message "Re: Some text that may be useful for the update of RFC 2376",
Chris Lilley wrote...


I think that such a transcoder is very helpful because it works for
all textual formats and also because it is very efficient.

No, it is not helpful, because it makes like a lot more difficult for
everyone else and leads to data corruption.

Apparently, Martin and I do not agree with you.

Incidentally I don't see an answer to my question about what such an
XML-unaweare transcoder would do when converting down from UTF-8 or UTF-16
to some 8-bit charset withall the unrepresentable characters. Since it
doesn't know XML itcan't use NCRS. What does it do, silently replace these
characters with question marks? And that is somehow OK? 

In the case of XML, conversion from Unicode to legacy encodings is not very 
useful.  Even when such conversion is requested, transcoders can give up 
transcoding, when they encounter something unrepresentable.


 >> The charset parameter is such a solution.
 >
 >It is one such solution. There are better ones, and indeed a much better
 >one in the XML specification.

It works only for XML.  It is not bad, when the MIME header is not 
available.
But when it is available, we must rely on the charset parameter.

For text/*, yes, we have to. Luckily there is application/* and model/* and
image/* and so forth for people using XML who care about data integrity and
don't want cheap text processing tools playing fast and loose with their
data.

I believe that "always the charset parameter" is the recommendation shown 
in RFC 2130 and the public page of W3C I18N WG.

In my message "History of the charset issue", I tried to summarize 
my understanding of the history.  I am unable to ignore the consensus 
of W3C I18N WG, W3C XML Syntax WG, and W3C XML SIG&WG, and the recommendation 
shown in RFC 2130.

You are advocating different in-band encoding signatures for different
formats.  I think that this is a significant burden to users and 
speficiation
developers.

You are advocating different out-of-band or in-band or mixed signatures for
different protocols.

The long-term goal is to make file systems of OS aware of the charset 
parameter.  Editors know the charset, and they store the charset information 
in the file system.  This info is then passed to WWW servers and further passed 
to WWW browsers.  The charset info is completely hidden from users and 
everything is automatic.  There will be no data corruption.

As of today, we need in-band signature and some tricks to keep out-of-band 
signature and in-band signature consistent.

Most modern WWW servers provide the charset parameter.  We only have to 
encourage them without repeading old arguments.

A solution that requires every "save as" of an XML
file to rewrite the (incorrect, but overridded by a MIME charset parameter)
encoding declaration, which was only incorrect because one of your "I know
how to fiddle with all text files" transcoders silently broke it in the
first place. This places, as you say, an intolerable burden on users.

You are confusing XML-unaware transcoders and XML-aware programs which 
save XML documents into files.

One of the things about XML, which differs from HTML, is typical patterns
of use. XML treansmitted over HTTP ius likely to be extensively manipulated
from the filesystem of both the server and the client, a common operation
which your proposal makes much more difficult, just to allow people who
write simple text processing tools to not add XML support. As a trade off,
i hope it is obvious to everyone else why this is such a bad idea.

I agree on the first and second sentence and completely disagree with 
the last sentence.


----
MURATA Makoto  muraw3c(_at_)attglobal(_dot_)net

<Prev in Thread] Current Thread [Next in Thread>