Re: [Slim] Review of draft-ietf-slim-negotiating-human-language-06

At 11:57 AM +0100 2/22/17, Gunnar Hellström wrote:

 A few comments inline,


 Den 2017-02-22 kl. 02:44, skrev Randall Gellens:

 Hi Dale,

 Thank you for your review, I appreciate it.  Please see inline.

 At 6:32 PM -0800 2/17/17, Dale Worley wrote:

  Reviewer: Dale Worley
  Review result: Ready with Nits

  I am the assigned Gen-ART reviewer for this draft.  The General Area
  Review Team (Gen-ART) reviews all IETF documents being processed
  by the IESG for the IETF Chair.  Please treat these comments just
  like any other last call comments.

  For more information, please see the FAQ at
  <http://wiki.tools.ietf.org/area/gen/trac/wiki/GenArtfaq>.

  Document:  draft-ietf-slim-negotiating-human-language-06
  Reviewer:  Dale R. Worley
  Review Date:  2017-02-17
  IETF LC End Date:  2017-02-20
  IESG Telechat date:  [unknown]

  Summary:
         This draft is basically ready for publication, but has nits
         that should be fixed before publication.

  * Technical comments

  A. Call failure

  If a call fails due to no available language match, in what way(s)
  does it fail?  Section 5.3 says

     If such an offer is received, the receiver MAY
     reject the media, ignore the language specified, or attempt to
     interpret the intent

  But I suspect it's also allowed for the UAS to fail the call at the
  SIP level.  Whether or not that is allowed (or at least envisioned)
  should be described.  And what response code(s)/warn-code(s) should
  be
  used for that?

The text you quote has been deleted. Thedraft does not mandate if the call shouldproceed or fail if there is no language matchpossible, although the draft does provide anoptional mechanism to indicate the caller'spreference that the call not fail, and thedraft does mention that in the emergencyservices case, the call will likely proceed,but that's a matter of policy not protocol.

You may have a version where it is proposedthat the text is deleted. We need to see thatnew text and agree if it was good to delete it.

I will be uploading the updated draft shortly.The only question is if I upload it before orafter adding any extra examples.

There are more places in the draft wherefailing the call is mentioned, so the questionabout how it is failed is relevant anyway. A603 Decline from the proxy would likely be thenatural way to fail the call when it is becauseof lack of matching languages. But I do not seeany natural way for an addressed UA to signalthis.


I do not believe the draft needs to mandate how a call is rejected.


  B. Audio/Video coordination

     5.2.  New 'humintlang-send' and 'humintlang-recv' attributes

     Note that while signed language tags are used with a video stream
  to
     indicate sign language, a spoken language tag for a video stream
  in
     parallel with an audio stream with the same spoken language tag
     indicates a request for a supplemental video stream to see the
     speaker.

  And there's a similar paragraph in 5.4:

     A spoken language tag for a video stream in conjunction with an

  audio

     stream with the same language might indicate a request for
     supplemental video to see the speaker.


  I think this mechanism needs to be described more exactly, and in
  particular, it should not depend on the UA understanding which
  language tags are spoken language tags.  It seems to me that a
  workable rule is that there is an audio stream and a video stream and
  they specify exactly the same language tag in their respective
  humintlang attributes.  In that case, it is a request for a spoken
  language with simultaneous video of the speaker, and those requests
  should be considered satisfied only if both streams can be
  established.

The text you quote has been deleted. A mediastream for supplemental purposes can benegotiated without a language tag, as normal.

 The text should not be deleted just because it is under discussion.

It was controversial and not needed, hence it wasdeleted. The WG expressed a goal of publishing asimple document to have something that can bedeployed.

It is a valid and valuable alternative. At themoment we lack ways to indicate if languagesare wanted together or seen as separatealternatives, and we have said that suchdetailing can be added in future versions oradditional specifications if not done now.Therefore we had better allow this combinationand let the negotiating parties sort out whatit currently means. The specification justindicates that the indicated languages arealternatives, and any number may be selectedfor matching and usage in the session.I do not think we should require very exactlymatching language tags between spoken languagein audio and corresponding view of the speakerin video.

We are not requiring this. On the contrary, avideo stream without a language attributeindicates it is used for supplemental purposes,not interactive language. If we need morecapabilities, this can be done in the future.

  * The following three items are adjustments to the design which I'd
  like to know have been considered.

  C. "humintlang" seems long to me

  Given the excessive length of SDP in practice, it seems to me that a
  shorter attribute name would be desirable.  E.g., "humlang" as was
  used in some previous versions.  Or is there a coordinated usage with
  other names in the "hum*lang" pattern?
There is no intent for a coordinated pattern.The name was chosen years ago to avoidpotential confusion with the 'lang' attribute.Is it worth reopening the issue to potentiallysave three characters per SDP line with alanguage?
  D. Use the Accept-Language syntax

  It seems to me that it would better to use the Accept-Language syntax
  for the attribute values.  This allows (1) specifiying the quality of
  language experience, allowing clear description of bilingualism, (2)
  a
  unified method of specifying whether or not arbitrary languages are
  acceptable, and (3) abbreviating SDP descriptions.

  In a way, the fact that the current proposal seems to require (but
  does not directly specify) the coordinated absence/presence of an
  asterisk on all of the repetitions of humintlang-send or
  humintlang-recv is a warning that the syntax doesn't represent the
  semantics as well as it might.
The group considered multiple proposals topermit specifying quality, preference,q-values, etc. but decided to keep thingssimple for this draft. There is no intent torequire the use of an asterisk (to indicate apreference by the caller to not fail thecall). The asterisk is a very mild mechanismwith no normative effects. It merely conveysthe preference of the caller, and is notbinding on the answerer.

I would never dare to use an sdp without anasterisk. Language matching is a tricky thingand human capabilities much wider than thelanguage tags can express. If I get a call froma Norwegian user talking Norwegian andindicating Norwegian in the humintlangattributes, I want to get the call accepted bymy device with setting for spoken Swedish,because Swedes and Norwegians usually cancommunicate quite well speaking their ownlanguages. I would anyway not have imaginedmaking a setting for Norwegian language as apreference in my device.It will however be an excellent help for me tosee the indication of the Norwegian, when I getthe call, so I can tune my listening.By setting the asterisk somewhere among theHumintlang attributes, I will make sure that Ido not lose calls that I could handle.

  E. Have an attribute to abbreviate the bidirectionally-symmetric case

  Note that all examples are bidirectionally symmetric, and the text
  says that requests and responses SHOULD be bidirectionally symmetric.
  So it would be a very useful abbreviation to define
  "humintlang=<value>" to be equivalent to the combination of
  "humintlang-send=<value>" and "humintlang-recv=<value>".

  Combining proposals C, D, and E, the examples become

        m=audio 49170 RTP/AVP 0
        a=humlang:en

        m=video 51372 RTP/AVP 31 32
        a=humlang:ase,*;q=0.1

        m=audio 49250 RTP/AVP 20
        a=humlang:es,eu;q=0.9,en;q=0.8,*;q=0.1

        m=text 45020 RTP/AVP 103 104
        a=humlang:gr

  which requires about half as many characters as they have now.

A third attribute without the "-send" or"-recv" to indicate bidirectionality wouldreduce the characters in the SDP block, at thecost of some added complexity (e.g., what ifall three appear). I don't believe this hasbeen discussed in the group.

  * Editorial comments and nits

  Abstract

     This document describes the need and a solution using new SDP
  stream
     attributes.

  I don't think the term "stream attribute" is used in RFC 4566.
  Instead, it uses "media attribute".


 Fixed.

  1.  Introduction

     caller and callee know each other or there is contextual or out of
     band information from which the language(s) and media modalities
  can

  I think this context, it's preferred to hyphenate "out-of-band" to
  make it clearly be an adjective.

OK.

     This approach has a number of benefits, including that it is
  generic
     (applies to all interactive communications negotiated using SDP)
  and
     not limited to emergency calls.

  I think s/and not limited to/and is not limited to/ reads more
  smoothly.


 There's no harm in the extra "is" so I'm happy to add it.

     But it is clearly useful in many other cases.  For
     example, someone calling a company call center or a Public Safety
     Answering Point (PSAP) should be able to indicate if one or more
     specific signed, written, and/or spoken languages are preferred,
  the
     callee should be able to indicate its capabilities in this area,
  and
     the call proceed using in-common language(s) and media forms.

  I think s/preferred, the callee/preferred; the callee/ because the
  sentence is the concatenation of two sentences.


 I reworded the sentence to flow better:

    For example, it is helpful that someone calling a company call center
    or a Public Safety Answering Point (PSAP) be able to indicate
    preferred signed, written, and/or spoken languages, the callee be
    able to indicate its capabilities in this area, and the call proceed
    using the language(s) and media forms supported by both.

  Perhaps s/in-common/shared/.


 Fixed in the rewording above.


     Including the user's human (natural) language preferences in the
     session establishment negotiation is independent of the use of a
     relay service and is transparent to a voice service provider.

  I think it's even broader than "transparent to a voice service
  provider" -- it's transparent to any serivice provider, assuming that
  the media are language-neutral.


 I changed it to read "voice or other service provider".


     In the case of a call to e.g., an airline, the call could be
     automatically handled by a Spanish-speaking agent.

  I think s/handled by/routed to/ is the usual usage.

We are trying to be careful in the draft tonot imply that it is discussing call routing.I'd rather keep the more generic "handled by".


  3.  Desired Semantics

     The desired solution is a media attribute (preferably per
  direction)
     that may be used within an offer to indicate the preferred
  language
     of each (direction of a) media stream, and within an answer to
     indicate the accepted language.

  In this one instance, I think you want to use "language(s)" to drive
  home that that multiple languages can be specified:  "within an offer
  to indicate the preferred language(s)".

     (Negotiating multiple simultaneous languages within a media stream
  is
     out of scope, as the complexity of doing so outweighs the
     usefulness.)

  You might want to say instead "(Negotiating multiple simultaneous
  languages within a media stream is out of scope for this document.)"
  to ensure that nobody decides to argue whether "the complexity of
  doing so outweighs the usefulness".


 I agree and deleted "the complexity of doing so outweighs the usefulness".


  4.  The existing 'lang' attribute

     RFC 4566 [RFC4566] specifies an attribute 'lang' which appears
     similar to what is needed here, but is not sufficiently detailed
  for
     use here.

  "for use here" isn't quite right.  Maybe "is not sufficiently
  specific
  or flexible to satisfy the requirements".

     In addition, it is not mentioned in [RFC3264]

  "it" is somewhat ambiguous here, perhaps change to "the 'lang'
  attribute".


 OK, accepted both changes.


  5.  Proposed Solution

  Perhaps /Proposed Solution/Solution/, since once this draft is
  approved, it becomes the solution.

OK.


  5.2.  New 'humintlang-send' and 'humintlang-recv' attributes

        a=humintlang-send:<language tag>
        a=humintlang-recv:<language tag>

  This is presented as the generic form of the attributes, but there is
  no indication of the posible asterisk.


 The syntax has been deleted from 5.2 since it's now in 6.

I think it should not be deleted, but insteadDale's comment satisfied. That would be morein line with the stye of rfc4566bis.5.2 shows the complete attributes, and that isgood. Chapter 6 only shows the syntax of thevalue.


Having the syntax in two places is a stylistic matter.


     The values constitute a list of languages
     in preference order (first is most preferred).

  "The values" isn't very clear, because the values are in successive
  attributes.  You want to say something like "The sequence of values
  in
  the occurrences of one of these attributes constitutes ...".
  However,
  see the technical comments above.


 The text was reworded to read:

    The values from all
    instances of the attribute constitute a list of languages in
    preference order (first is most preferred).


     When placing an emergency call, and in any other case where the
     language cannot be assumed from context, each media stream in an
     offer primarily intended for human language communication SHOULD
     specify both (or in some cases, one of) the 'humintlang-send' and
     'humintlang-recv' attributes.

  Probably s/assumed/inferred/.


 I agree.


  Could you be more accurate by
  s/or in some cases/or for unidirectional streams/?


 I agree.


  5.3.  Advisory vs Required

     The mechanism for indicating this preference is that, in an offer,
  if
     the last character of any of the 'humintlang-recv' or 'humintlang-
     send' values is an asterisk, this indicates a request to not fail
  the
     call (similar to SIP Accept-Language syntax).  Either way, the
  called
     party MAY ignore this, e.g., for the emergency services use case,
  a
     PSAP will likely not fail the call.

  The construction of this paragraph isn't quite complete.  It says
  that
  if an asterisk is present, a request shouldn't fail, but it doesn't
  say that if no asterisk is present, a request should fail if there is
  no language match.  And it's the latter condition that makes the
  second sentence meaningful.  So I think you want to insert between
  the
  two sentences one regarding the absence of an asterisk.


 I've reworded the section to read:

    A consideration with the ability to negotiate language is if the call
    proceeds or fails if the callee does not support any of the languages
    requested by the caller.  This document does not mandate either
    behavior, although it does provide a way for the caller to indicate a
    preference for the call succeeding when there is no language in
    common.  It is OPTIONAL for the callee to honor this preference.  For
    example, a PSAP is likely to attempt the call even without an
    indicated preference when there is no language in common, while a
    call center might choose to fail the call.

    The mechanism for indicating this preference is that, in an offer, if
    the last character of any of the 'humintlang-recv' or 'humintlang-
    send' values is an asterisk, this indicates a request to not fail the
    call.  The called party MAY ignore the indication, e.g., for the
    emergency services use case, regardless of the absence of an
    asterisk, a PSAP will likely not fail the call; some call centers
    might reject a call even with an asterisk.

 This still does not meet Dale's comment.
 Insert a sentence saying:

"A preference for getting the call denied incase of no language match SHOULD be indicatedby no asterisk appended on any humintlangattribute value in the whole SDP."

I don't think that's needed, and it's a strangelyconstructed normative directive. How can apreference have a SHOULD?

  5.5.  Examples

  Given that the combined audio/video mechanism is the only
  irregularity
  in this system, there ought to be an example of it.  E.g.,

     An example of a supplemental video stream with a spoken language
     audio stream:

        m=video 51372 RTP/AVP 31 32
        a=humintlang-send:en
        a=humintlang-recv:en

        m=audio 49250 RTP/AVP 20
        a=humintlang-send:en
        a=humintlang-recv:en
If the video stream is supplemental then itdoesn't have a language (the text thatsuggested otherwise has been deleted). But Iam considering adding more examples.
I provided a rich set of examples in my LCcomments of Feb 13. Please consider them as abase even if some need revision if the proposedextended meaning of the asterisk is notaccepted.


Thank you for submitting them; I am considering them.


  6.  IANA Considerations

        humintlang-value =  Language-Tag [ asterisk ]
                            ; Language-Tag defined in RFC 5646
        asterisk         =  "*"

  s/Language-Tag defined in RFC 5646/Language-Tag as defined in RFC
  5646/

  But perhaps also s/RFC 5646/BCP 47/, which ensures that "humintlang"
  tracks the current version of language tags.

Ok.


  Appendix A.  Historic Alternative Proposal: Caller-prefs

     This
     results in a more fragile solution since the media modality and
     language would be negotiated using SIP, and then the specific
  media
     formats (which inherently include the modality) would be
  negotiated
     at a different level (typically SDP, especially in the emergency
     calling cases), making it easier to have mismatches (such as where
     the media modality negotiated in SIP don't match what was
  negotiated
     using SDP).

  "the media modality and language would be negotiated using SIP" isn't
  quite the right way to say it because SIP isn't explicitly
  negotiating
  the modality.  Better would be

     ... the language (and by implication the media modality) would be
     negotiated using SIP, and then the specific media (which
  inherently
     include the modalities and formats) would be negotiated at a
     different level ...


 This section has been deleted.

 Did we agree on that?

It was suggested in earlier comments to delete itas it is clearly not needed (it's an informativeappendix detailing a historical alternative thatwas not pursued).


  [END]

 Regards
 Gunnar

 --
 -----------------------------------------
 Gunnar Hellström
 Omnitor
 gunnar(_dot_)hellstrom(_at_)omnitor(_dot_)se
 +46 708 204 288



--
Randall Gellens
Opinions are personal;    facts are suspect;    I speak for myself only
-------------- Randomly selected tag: ---------------
   The highlight of the annual Computer Bowl occurred when Bill Gates,
who was a judge, posed the following question to the contestants:
   "What contest, held via Usenet, is dedicated to examples of weird,
obscure, bizarre, and really bad programming?"
   After a moment of silence, Jean-Louis Gassee (ex-honcho at Apple)
hit his buzzer and answered "Windows."
                                         --Recounted by Adam C. Engst