Re: [Slim] IETF last call for draft-ietf-slim-negotiating-human-languag

At 3:52 PM -0800 2/15/17, Bernard Aboba wrote:

 Gunnar Hellstrom said:
"The SDP Lang attribute in RFC 4566, where you(Randall) say it is intended for specifying aset of languages that all must be used in asession, while I say that it is intended fornegotiation of at least one initial language."
[BA] At IETF 96 in Berlin, we had a discussionof the history of the SDP Lang attribute withinthe MMUSIC WG.
The Lang attribute was originally specified inRFC 2327, which was published in April 1998,more than four years prior to the publicationof Offer/Answer RFC 3264 (June 2002), and threeyears prior to publication of the initialdraft-rosenberg-mmusic-sdp-offer-answer-00(April 26, 2001).
As a result, the Lang attribute could not havebeen designed for use in Offer/Answernegotiation, but instead was intended for usein the declarative SDP of multicastconferencing. Note that the Lang attribute wasnot mentioned in RFC 3264, and noone at theMMUSIC WG session was aware of a subsequent SIPOffer/Answer implementation of it.

Which is what I was saying: it is descriptive ofthe media, which is very different fromnegotiation. However, this is all moot now.

On Wed, Feb 15, 2017 at 1:41 AM, GunnarHellström<<mailto:gunnar(_dot_)hellstrom(_at_)omnitor(_dot_)se>gunnar(_dot_)hellstrom(_at_)omnitor(_dot_)se>wrote:
 Den 2017-02-15 kl. 01:39, skrev Randall Gellens:

 At 4:21 PM -0800 2/14/17, Randy Presuhn wrote:

  Hi -

  On 2/14/2017 2:43 PM, Randall Gellens wrote:

  At 8:59 PM +0100 2/14/17, Gunnar Hellström wrote:

   Den 2017-02-14 kl. 19:05, skrev Randy Presuhn:

   Hi -

   On 2/14/2017 9:40 AM, Randall Gellens wrote:

   At 11:01 AM +0100 2/14/17, Gunnar Hellström wrote:

    My proposal for a reworded section 5.4 is:

    5.4.  Unusual language indications

    It is possible to specify an unusual indication where the language
    specified may look unexpected for the media type.

    For such cases the following guidance SHALL be applied for the
   humintlang attributes used in these situations.

    1.    A view of a speaking person in the video stream SHALL, when it
   has relevance for speech perception, be indicated by a Language-Tag
   for spoken/written language with the "Zxxx" script subtag to indicate
   that the contents is not written.

    2.    Text captions included in the video stream SHALL be indicated
   by a Language-Tag for spoken/written language.

    3.    Any approximate representation of sign language or
   fingerspelling in the text media stream SHALL be indicated by a
   Language-Tag for a sign language in text media.

    4.    When sign language related audio from a person using sign
   language is of importance for language communication, this SHALL be
   indicated by a Language-Tag for a sign language in audio media.


   [RG] As I said, I think we should avoid specifying this until we have
   deployment experience.

   ...

   From a process perspective, it's far easier to remove constraints
   as a specification advances than it is to add them.

   I agree. It is often better to specify normatively as far as you can
  imagine, so that interoperability and good functionality is achieved.
  Stopping halfway and have MAY in the specifications creates
  uncertainty and less useful specifications.


  My reading of what Randy says is the opposite of Gunnar's. In my
  reading, Randy points out that is it easier to remove the SHOULD NOT in
  the future then it is to change the meaning of the combinations or
  switch to a different mechanism.

  In my experience, it's better to specify only what we know we need and
  what we know we understand.  Speculative specifications "as far as you
  can imagine" more often lead to interoperability problems, unnecessary
  complexity, limitations on what's needed in the future, and divergent
  implementations.


  I think the difference in your positions comes down to

    (1) your respective notions of "what we know we need and what we
        know we understand";

    (2) whether you believe that the interoperability and conformance
        consequences of removing a "SHOULD NOT" could be the same
        as those merely retaining a "MUST" or "SHALL" - this determines
        whether Randy G.'s proposal provides a path for some future
        revision to mandate (if deployment experience substantiates the
        need/understanding) the behavior proposed by Gunnar. That path
        is not at all obvious to me.
The purpose of the draft is to enable the twoendpoints of a real-time communication sessionto agree which languages and media to use forinteractive communication. We have a mechanismof adding language tags to media streamnegotiations. In most cases, the language andmedia modality are an obvious fit. There arecombinations of media and language where themeaning is not so obvious, specifically, signedlanguage tags with a audio or text, andnon-signed language tags with video. Myproposal is that we say offerer SHOULD NOT sendsuch combinations and answerer MAY ignorelanguage. This allows future specifications forthe underlying uses Gunnar wants (such asreal-time subtitles in video and signedequivalents in text). Such futurespecifications could define a use for thelanguage and media combinations and remove theSHOULD NOT send and MAY ignore, or could definea new mechanism. I don't think we know enoughnow to dictate what the solution should be.
We have a fresh example from our owndiscussions in the SLIM group how unfortunateit is to not be sufficiently explicit in thefirst edition of a standard. The SDP Langattribute in RFC 4566, where you (Randall) sayit is intended for specifying a set oflanguages that all must be used in a session,while I say that it is intended for negotiationof at least one initial language. By havingthat uncertainty in a specification that hasbeen published makes it very hard to sharpen upthe specification afterwards because it wouldpossibly make some implementations nonconformant. And it makes potential implementorshesitant to use the current specifications, asit was with the SLIM work.
 For 5.4.

 I am OK with modifying from my latest proposal, but we need to be specific.
 I am also OK with reducing the SHALLs to SHOULDs as Addison requested.
The situation is not that we lack knowledge.Here is what we know about the 4 cases of"unusual" indications:
1. View of the speaker in video. Very importantfor speech perception. Quality requirements aredocumented in ITU-T H-series Supplement 1. Ofreal use only as a complement to the samespoken language in audio. Now, when we knowabout the Zxxx notation for non-written, wealso have a good way of specifying it precisely.
 This case was also described in section 5.2 already.

 2. Text captions in the video stream.
This can be either text merged into video andcommunicated as true part of the video image,or it can be a text component of a multimediasystem, as MPEG-4, declared in SDP as m=video.It has been used in some videophone products,but I have not seen it used lately.It is a clearly defined case, and we canspecify coding for it, but we do not at themoment know if it will be important to specifyit.
 3. Sign language or fingerspelling in the text stream.
I have seen a product using it for claimed signlanguage conversation. It is also in use in thesimple text form with words in capitalsapproximately representing signs betweenpersons involved in preparation of signlanguage productions and translations. But inthat case it is in a session where they agreein other ways to start using the text streamfor that purpose. So I think we can say thatthis is rare, and its use can be agreed byother means between the users. Still it is aclearly defined case.
4. Audio from signing person related to signlanguage. This is more vague than the others.It may be a person signing in video and addingspoken words in audio to signing, butinfluenced by the word order and grammar ofsign language with some ambition to make itreasonably understandable for both deaf andhearing participants. There are even somespoken words created from sign language thatare commonly used by hearing persons in suchsituations. But for that case I anyway think itis better to define the audio part as thespoken language it is derived from, because ofits intention to be understandable for hearingpersons. All other variants I can imagine areeven closer to the spoken language and shouldbe specified with spoken language tag. If weonly want to have the audio stream establishedto hear the background in the signingsituation, then we should not specify languageuse of the audio stream.Even if we know what sign language tag in audiostream would be, it may be just as good toleave it undefined.
------------------------------------------------------------------------------------------------------------------------------------------------
 So, new proposal:

 5.4.  Unusual language indications

    It is possible to specify an unusual indication where the language
    specified may look unexpected for the media type.

    For such cases the following guidance SHOULD be applied for the
   humintlang attributes used in these situations.

    1.    A view of a speaking person in the video stream SHOULD, when it
has relevance for speech perception, beindicated by a humintlang attribute with aLanguage-Tag
   for a spoken/written language with the "Zxxx" script subtag to indicate
   that the contents is not written.

    2.    Text captions included in the video stream SHOULD be indicated
   by a humintlang attribute with Language-Tag for spoken/written language.
3. A Language-Tag for a sign languagespecified in a humintlang attribute for a textstream MAY be interpreted as use of anapproximate representation of sign language orfingerspelling in the text media stream. Theuse of such representation is rare and usuallyconveniently agreed by other means between theusers during an established session. Commonsupport of this indication SHOULD NOT beassumed or required.
4. A Language-Tag for a sign languagespecified in a humintlang attribute for anaudio stream SHOULD NOT be indicated and MAY beignored on reception. Any use of spoken wordsor spoken language in the audio stream SHOULD,when it can be of importance for languagecommunication, be indicated by thecorresponding Language-Tag for spoken languagein a humintlang attribute for the audio stream.
 Gunnar


 --
 -----------------------------------------
 Gunnar Hellström
 Omnitor
 
<mailto:gunnar(_dot_)hellstrom(_at_)omnitor(_dot_)se>gunnar(_dot_)hellstrom(_at_)omnitor(_dot_)se
 <tel:%2B46%20708%20204%20288>+46 708 204 288

 _______________________________________________
 SLIM mailing list
 <mailto:SLIM(_at_)ietf(_dot_)org>SLIM(_at_)ietf(_dot_)org

<https://www.ietf.org/mailman/listinfo/slim>https://www.ietf.org/mailman/listinfo/slim



--
Randall Gellens
Opinions are personal;    facts are suspect;    I speak for myself only
-------------- Randomly selected tag: ---------------
Computers are not intelligent.  They only think they are.

Re: [Slim] IETF last call for draft-ietf-slim-negotiating-human-language (Section 5.4)