Re: [Slim] IETF last call for draft-ietf-slim-negotiating-human-language

Den 2017-02-15 kl. 01:39, skrev Randall Gellens:

At 4:21 PM -0800 2/14/17, Randy Presuhn wrote:
 Hi -

 On 2/14/2017 2:43 PM, Randall Gellens wrote:
 At 8:59 PM +0100 2/14/17, Gunnar Hellström wrote:
  Den 2017-02-14 kl. 19:05, skrev Randy Presuhn:
  Hi -

  On 2/14/2017 9:40 AM, Randall Gellens wrote:
  At 11:01 AM +0100 2/14/17, Gunnar Hellström wrote:
   My proposal for a reworded section 5.4 is:

   5.4.  Unusual language indications
It is possible to specify an unusual indication where thelanguage
   specified may look unexpected for the media type.

   For such cases the following guidance SHALL be applied for the
  humintlang attributes used in these situations.
1. A view of a speaking person in the video stream SHALL,when ithas relevance for speech perception, be indicated by aLanguage-Tagfor spoken/written language with the "Zxxx" script subtag toindicate
  that the contents is not written.
2. Text captions included in the video stream SHALL beindicated
  by a Language-Tag for spoken/written language.

   3.    Any approximate representation of sign language or
  fingerspelling in the text media stream SHALL be indicated by a
  Language-Tag for a sign language in text media.

   4.    When sign language related audio from a person using sign
language is of importance for language communication, thisSHALL be
  indicated by a Language-Tag for a sign language in audio media.
[RG] As I said, I think we should avoid specifying this untilwe have
  deployment experience.
  ...

  From a process perspective, it's far easier to remove constraints
  as a specification advances than it is to add them.
  I agree. It is often better to specify normatively as far as you can
 imagine, so that interoperability and good functionality is achieved.
 Stopping halfway and have MAY in the specifications creates
 uncertainty and less useful specifications.
 My reading of what Randy says is the opposite of Gunnar's. In my
reading, Randy points out that is it easier to remove the SHOULDNOT in
 the future then it is to change the meaning of the combinations or
 switch to a different mechanism.

 In my experience, it's better to specify only what we know we need and
 what we know we understand.  Speculative specifications "as far as you
 can imagine" more often lead to interoperability problems, unnecessary
 complexity, limitations on what's needed in the future, and divergent
 implementations.
 I think the difference in your positions comes down to

   (1) your respective notions of "what we know we need and what we
       know we understand";

   (2) whether you believe that the interoperability and conformance
       consequences of removing a "SHOULD NOT" could be the same
       as those merely retaining a "MUST" or "SHALL" - this determines
       whether Randy G.'s proposal provides a path for some future
       revision to mandate (if deployment experience substantiates the
       need/understanding) the behavior proposed by Gunnar. That path
       is not at all obvious to me.
The purpose of the draft is to enable the two endpoints of a real-timecommunication session to agree which languages and media to use forinteractive communication. We have a mechanism of adding languagetags to media stream negotiations. In most cases, the language andmedia modality are an obvious fit. There are combinations of mediaand language where the meaning is not so obvious, specifically, signedlanguage tags with a audio or text, and non-signed language tags withvideo. My proposal is that we say offerer SHOULD NOT send suchcombinations and answerer MAY ignore language. This allows futurespecifications for the underlying uses Gunnar wants (such as real-timesubtitles in video and signed equivalents in text). Such futurespecifications could define a use for the language and mediacombinations and remove the SHOULD NOT send and MAY ignore, or coulddefine a new mechanism. I don't think we know enough now to dictatewhat the solution should be.

We have a fresh example from our own discussions in the SLIM group howunfortunate it is to not be sufficiently explicit in the first editionof a standard. The SDP Lang attribute in RFC 4566, where you (Randall)say it is intended for specifying a set of languages that all must beused in a session, while I say that it is intended for negotiation of atleast one initial language. By having that uncertainty in aspecification that has been published makes it very hard to sharpen upthe specification afterwards because it would possibly make someimplementations non conformant. And it makes potential implementorshesitant to use the current specifications, as it was with the SLIM work.


For 5.4.

I am OK with modifying from my latest proposal, but we need to be specific.
I am also OK with reducing the SHALLs to SHOULDs as Addison requested.

The situation is not that we lack knowledge. Here is what we know aboutthe 4 cases of "unusual" indications:

1. View of the speaker in video. Very important for speech perception.Quality requirements are documented in ITU-T H-series Supplement 1. Ofreal use only as a complement to the same spoken language in audio. Now,when we know about the Zxxx notation for non-written, we also have agood way of specifying it precisely.

This case was also described in section 5.2 already.

2. Text captions in the video stream.

This can be either text merged into video and communicated as true partof the video image, or it can be a text component of a multimediasystem, as MPEG-4, declared in SDP as m=video.It has been used in some videophone products, but I have not seen itused lately.It is a clearly defined case, and we can specify coding for it, but wedo not at the moment know if it will be important to specify it.


3. Sign language or fingerspelling in the text stream.

I have seen a product using it for claimed sign language conversation.It is also in use in the simple text form with words in capitalsapproximately representing signs between persons involved in preparationof sign language productions and translations. But in that case it is ina session where they agree in other ways to start using the text streamfor that purpose. So I think we can say that this is rare, and its usecan be agreed by other means between the users. Still it is a clearlydefined case.

4. Audio from signing person related to sign language. This is morevague than the others. It may be a person signing in video and addingspoken words in audio to signing, but influenced by the word order andgrammar of sign language with some ambition to make it reasonablyunderstandable for both deaf and hearing participants. There are evensome spoken words created from sign language that are commonly used byhearing persons in such situations. But for that case I anyway think itis better to define the audio part as the spoken language it is derivedfrom, because of its intention to be understandable for hearing persons.All other variants I can imagine are even closer to the spoken languageand should be specified with spoken language tag. If we only want tohave the audio stream established to hear the background in the signingsituation, then we should not specify language use of the audio stream.Even if we know what sign language tag in audio stream would be, it maybe just as good to leave it undefined.

------------------------------------------------------------------------------------------------------------------------------------------------
So, new proposal:

5.4.  Unusual language indications

   It is possible to specify an unusual indication where the language
   specified may look unexpected for the media type.

   For such cases the following guidance SHOULD be applied for the
  humintlang attributes used in these situations.

   1.    A view of a speaking person in the video stream SHOULD, when it

has relevance for speech perception, be indicated by a humintlangattribute with a Language-Tag

  for a spoken/written language with the "Zxxx" script subtag to indicate
  that the contents is not written.

   2.    Text captions included in the video stream SHOULD be indicated
  by a humintlang attribute with Language-Tag for spoken/written language.

3. A Language-Tag for a sign language specified in a humintlangattribute for a text stream MAY be interpreted as use of an approximaterepresentation of sign language or fingerspelling in the text mediastream. The use of such representation is rare and usually convenientlyagreed by other means between the users during an established session.Common support of this indication SHOULD NOT be assumed or required.

4. A Language-Tag for a sign language specified in a humintlangattribute for an audio stream SHOULD NOT be indicated and MAY be ignoredon reception. Any use of spoken words or spoken language in the audiostream SHOULD, when it can be of importance for language communication,be indicated by the corresponding Language-Tag for spoken language in ahumintlang attribute for the audio stream.





Gunnar


--
-----------------------------------------
Gunnar Hellström
Omnitor
gunnar(_dot_)hellstrom(_at_)omnitor(_dot_)se
+46 708 204 288

Re: [Slim] IETF last call for draft-ietf-slim-negotiating-human-language (Section 5.4)