ietf
[Top] [All Lists]

Re: [Slim] IETF last call for draft-ietf-slim-negotiating-human-language (Section 5.4)

2017-02-13 16:23:35
Den 2017-02-13 kl. 22:58, skrev Bernard Aboba:
Gunnar said:

"With some hesitation I suggest to let it mean to see a speaking person."

[BA] Is this for the purpose of enabling lip reading?
Yes

Assuming that we go that way, how would captioning be negotiated?
It is best placed in text media.

But captions overlayed on video in the media stream is a used technology so it would be good to be able to specify it. That we cannot do it is again a sad effect of the language tags not distinguishing between spoken and written modality. I once had an ambition to try to specify a notation for that to be added to BCP 47, but did not succeed to get any real discussion going on the topic.

Eventually there may be a need to specify a Modality attribute. That may be needed for media specified e.g. as m=application where the protocol can carry all kinds of modality and it is not apparent from the m-line what it is. These are however not common for real-time conversational purposes, so I do not think it is urgent to solve the problem for m=application now. But maybe for captioning in video media?

/Gunnar



On Mon, Feb 13, 2017 at 1:23 PM, Gunnar Hellström <gunnar(_dot_)hellstrom(_at_)omnitor(_dot_)se <mailto:gunnar(_dot_)hellstrom(_at_)omnitor(_dot_)se>> wrote:

    Bernard,

    I just issued comments where I also included the "silly states"
    topic with similar views as yours.


    Den 2017-02-13 kl. 20:06, skrev Bernard Aboba:
    Looking over Section 5.4, it seems to me that the title "Silly
    States" may not be appropriate, because it mixes discussion of
    combinations of media and language that have an "undefined"
    meaning with combinations for which normative guidance can be
    provided  So rather than having a single "Silly States" section,
    perhaps we can have a section on "Undefined States" (for those
    combinations which have an undefined meaning) provide normative
    guidance on defined combinations elsewhere.


          5.4
          
<https://tools.ietf.org/html/draft-ietf-slim-negotiating-human-language-06#section-5.4>.
          Silly States



        It is possible to specify a "silly state" where the language
        specified does not make sense for the media type, such as specifying
        a signed language for an audio media stream.
        An offer MUST NOT be created where the language does not make sense
        for the media type.  If such an offer is received, the receiver MAY
        reject the media, ignore the language specified, or attempt to
        interpret the intent (e.g., if American Sign Language is specified
        for an audio media stream, this might be interpreted as a desire to
        use spoken English).

        A spoken language tag for a video stream in conjunction with an audio
        stream with the same language might indicate a request for
        supplemental video to see the speaker.
    [BA] Rather than using terms like "might" for combinations that could have a
    defined meaning, I would like to see the specification provide normative
    language on these use cases. In particular, I would like the specification 
to describe:
    a. What it means when a spoken language tag is included for a video stream.
    Is this to be interpreted as a request for captioning?
    b. What it means when a signed language tag is included for an audio stream.
    Is the meaning of this "undefined" and if so, should it be ignored?
    c. What it means when a signed language tag is included for a text stream.
    If some of these scenarios are not defined, the specification can say
    "this combination does not have a defined meaning" or something like that.
    See my recent comments for more views. I support the idea to be
    normative and specific when possible.
    A complication is that there is no difference between language
    tags for written and spoken language.

    So we have the following possible combinations and interpretations
    of "silly states"

    1. Spoken/written tag in video media, can mean to see a speaking
    person, or to provide captions overlayed on video.
    With some hesitation I suggest to let it mean to see a speaking
    person. The draft adds a requirement to have the same language in
    the audio stream in the same direction to have that
    interpretation.  Should that mean that if there is another
    language in the audio stream, then the spoken/written tag in the
    video stream should mean captions in the specified language? That
    sounds useful for some cases, but complex to interpret and unfair
    to the users who would benefit from captions in the same language
    as in audio.
    Summary: I think we had better to use the interpretation to see a
    speaking person regardless of what language is indicated for audio.

    2. Signed language tag in audio media, can mean audio from a
    signing person. That could be anything between near silence and
    spoken words corresponding to the signed signs as far as feasible.
    This is usually seen as disturbing to sign language users but it
    exists, e.g. when one erson needs to communicate with both hearing
    and deaf persons simultaneously. There are also variants of
    signing, called sign supported language, with signs expressed with
    spoken language word order and grammar. That can more easily be
    combined with spoken language, but would more likely be indicated
    by spoken language tag in audio media.
    Summary: I am inclined to let signed language tag in audio media
    mean audio from the signing person and possibly used for the rare
    cases when it has some relevance for language communication.

    3. Sign language tag in text media. There are some ways to
    represent sign language in various kinds of symbol or text
    representation. Some are represented in Unicode. One is a system
    called Sign Writing. Some fingerspelling methods also have fonts
    corresponding to characters in code pages. There is also an
    informal way to write manuscripts for signing in words with
    capitals approximately corresponding to signs, often with some
    notation added for unique sign language ways of expression that
    has no direct correspondance to words. None of these systems above
    are common in real-time conversation, but I have seen examples of
    such use.
    Summary: I think we can leave freedom here and just specify that a
    sign language tag in text media means some representation of sign
    language or a corresponding fingerspelling system in text media.

    If these conclusions are accepted, we can formulate modified text.
    Note that the case with spoken/written language tag in video media
    is mentioned in two places in the draft.

    Regards
    Gunnar



    _______________________________________________
    SLIM mailing list
    SLIM(_at_)ietf(_dot_)org <mailto:SLIM(_at_)ietf(_dot_)org>
    https://www.ietf.org/mailman/listinfo/slim
    <https://www.ietf.org/mailman/listinfo/slim>

-- -----------------------------------------
    Gunnar Hellström
    Omnitor
    gunnar(_dot_)hellstrom(_at_)omnitor(_dot_)se 
<mailto:gunnar(_dot_)hellstrom(_at_)omnitor(_dot_)se>
    +46 708 204 288

_______________________________________________
SLIM mailing list
SLIM(_at_)ietf(_dot_)org
https://www.ietf.org/mailman/listinfo/slim
--
-----------------------------------------
Gunnar Hellström
Omnitor
gunnar(_dot_)hellstrom(_at_)omnitor(_dot_)se
+46 708 204 288