ietf
[Top] [All Lists]

Re: Comments on draft-shafranovich-mime-sql-03

2013-02-05 09:06:02
* Yakov Shafranovich wrote:
[...]

I am interested in this situation:

  -> Someone wants to publish database contents or schema
  -> Use DB-specific dumping tool to create .sql file
  -> Puts .sql file on web server
  -> Server associates .sql with proposed media type

  -> Someone else downloads this resource
  -> Checks IANA registry for the media type
  -> Finds proposed specification

Note that there is no step "publisher of .sql file ensures that the dump
tool generates US-ASCII encoded text, or otherwise makes sure the text's
in a single character encoding and makes sure the web server includes
the character encoding label in the `charset` header of the Content-Type
header when serving the .sql file". Experience suggests that respones
will include no or an incorrect label and downloaders are likely to ig-
nore the charset parameter even if correctly specified. However, reading
the draft the person in the sceanrio above would assume that he has got
US-ASCII encoded text, even though that's fairly unlikely, especially in
the future given "international text" and using UTF-8 without escapes is
becoming increasingly common.

Similarily, the draft would tell him to check some ISO standard for "the
Structured Query Language", even though most likely he should instead
identify which database software generated the file and check the manual
for that software to find out about all the files. As a simple example,
the dumps from <http://dumps.wikimedia.org/> read like this:

  -- MySQL dump 10.13  Distrib 5.1.66, for debian-linux-gnu (x86_64)
  --
  -- Host: 10.0.6.76    Database: frrwiki
  -- ------------------------------------------------------
  -- Server version     5.1.53-wm-log
  
  /*!40101 SET @OLD_CHARACTER_SET_CLIENT=@@CHARACTER_SET_CLIENT */;
  /*!40101 SET @OLD_CHARACTER_SET_RESULTS=@@CHARACTER_SET_RESULTS */;
  /*!40101 SET @OLD_COLLATION_CONNECTION=@@COLLATION_CONNECTION */;
  /*!40101 SET NAMES utf8 */;
  ...
  --
  -- Table structure for table `category`
  --
  
  DROP TABLE IF EXISTS `category`;
  /*!40101 SET @saved_cs_client     = @@character_set_client */;
  /*!40101 SET character_set_client = utf8 */; 
  ...

They do not currently use the proposed type, but if they did, you will
have to know the format of "MySQL dump" files and what the codes in the
comments here mean to conclude that these are actually UTF-8 encoded
files. Google will find other examples with `character_set_client` for
other character encodings like "latin1". The ISO standard, as far as I
am aware, will not help you there, and neither does the US-ASCII default
proposed in the draft.
-- 
Björn Höhrmann · mailto:bjoern(_at_)hoehrmann(_dot_)de · 
http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 

<Prev in Thread] Current Thread [Next in Thread>