perl-i18n

Re: [fwd] Re: [rt-devel] Language detection bug (from: ASnare(_at_)allshare(_dot_)nl)

2003-02-10 06:00:34
At 02:16 AM 10/02/2003 -0900, Sean M. Burke wrote:
At 2/10/2003 11:55 AM +0100, Andrew Snare wrote:
At 03:10 PM 7/02/2003 -0900, Sean M. Burke wrote:
[...]Incidentally, one that that Locale::Maketext does do, is the reverse: if a user accepts en-ca, Locale::Maketext will say "OK, I can also just give then en.pm". That is, this accept list:
Accept-Language: en-CA, es-MX
is treated as if it were really:
Accept-Language: en-CA, es-MX, en, es

I suppose I could have easly made it instead work so it would instead read it as:
Accept-Language: en-CA, en, es-MX, es
I'm not particularly attached to either way; I bet there's theoretical and practical arguments both ways. Does anyone have a preference? It's all negotiable.

This is arguably wrong.

So actually argue it.  English please, no predicate logic.

To quote the RFC:

        A language-range matches a language-tag if
        it exactly equals the tag, or if it exactly equals a prefix of the
        tag such that the first tag character following the prefix is "-".

There's no clause that says it matches if the language tag (server-side available) is a prefix of the language-range (what the client says is acceptable). At the very least, allowing this is an 'extension' of the standard. Hence (I think) we're discussing whether the extension is allowed. The language doesn't explicitly forbid it. However, to quote another section:

      Note: When making the choice of linguistic preference available to
      the user, we remind implementors of  the fact that users are not
      familiar with the details of language matching as described above,
      and should provide appropriate guidance. As an example, users
      might assume that on selecting "en-gb", they will be served any
      kind of English document if British English is not available. A
      user agent might suggest in such a case to add "en" to get the
      best matching behavior.

This example illustrates that whoever wrote the standard thinks that a lack of 'xx' when 'xx-yy' is acceptable is a client-side issue. For the example to make sense, the server must not send 'en' content when only 'en-gb' is on the accept-list.

As I've mentioned in earlier posts, it's ambiguous as to what the exact correct behaviour is. I personally don't have any investment in any particular interpretation, but am trying to ensure that whatever interpretation is made is consistent with the RFC. You're free to disagree with the RFC if you wish, but at least people should be aware that this is the case.

Finally, please also read one of the references I gave in an earlier post:
        
<http://groups.google.com/groups?selm=Pine.HPP.3.95a.1000121173010.24389J-100000%40hpplus01.cern.ch>
The author seems to have made the same interpretation of the RFC as me and illustrates it in a different (and possibly more helpful) way than me.

Cheers,

 - Andrew