Den 2017-02-13 kl. 22:58, skrev Bernard Aboba:
Gunnar said:
"With some hesitation I suggest to let it mean to see a speaking person."
[BA] Is this for the purpose of enabling lip reading?
Yes
Assuming that we go that way, how would captioning be negotiated?
It is best placed in text media.
But captions overlayed on video in the media stream is a used technology
so it would be good to be able to specify it.
That we cannot do it is again a sad effect of the language tags not
distinguishing between spoken and written modality.
I once had an ambition to try to specify a notation for that to be added
to BCP 47, but did not succeed to get any real discussion going on the
topic.
Eventually there may be a need to specify a Modality attribute. That may
be needed for media specified e.g. as m=application where the protocol
can carry all kinds of modality and it is not apparent from the m-line
what it is. These are however not common for real-time conversational
purposes, so I do not think it is urgent to solve the problem for
m=application now. But maybe for captioning in video media?
/Gunnar
On Mon, Feb 13, 2017 at 1:23 PM, Gunnar Hellström
<gunnar(_dot_)hellstrom(_at_)omnitor(_dot_)se <mailto:gunnar(_dot_)hellstrom(_at_)omnitor(_dot_)se>> wrote:
Bernard,
I just issued comments where I also included the "silly states"
topic with similar views as yours.
Den 2017-02-13 kl. 20:06, skrev Bernard Aboba:
Looking over Section 5.4, it seems to me that the title "Silly
States" may not be appropriate, because it mixes discussion of
combinations of media and language that have an "undefined"
meaning with combinations for which normative guidance can be
provided So rather than having a single "Silly States" section,
perhaps we can have a section on "Undefined States" (for those
combinations which have an undefined meaning) provide normative
guidance on defined combinations elsewhere.
5.4
<https://tools.ietf.org/html/draft-ietf-slim-negotiating-human-language-06#section-5.4>.
Silly States
It is possible to specify a "silly state" where the language
specified does not make sense for the media type, such as specifying
a signed language for an audio media stream.
An offer MUST NOT be created where the language does not make sense
for the media type. If such an offer is received, the receiver MAY
reject the media, ignore the language specified, or attempt to
interpret the intent (e.g., if American Sign Language is specified
for an audio media stream, this might be interpreted as a desire to
use spoken English).
A spoken language tag for a video stream in conjunction with an audio
stream with the same language might indicate a request for
supplemental video to see the speaker.
[BA] Rather than using terms like "might" for combinations that could have a
defined meaning, I would like to see the specification provide normative
language on these use cases. In particular, I would like the specification
to describe:
a. What it means when a spoken language tag is included for a video stream.
Is this to be interpreted as a request for captioning?
b. What it means when a signed language tag is included for an audio stream.
Is the meaning of this "undefined" and if so, should it be ignored?
c. What it means when a signed language tag is included for a text stream.
If some of these scenarios are not defined, the specification can say
"this combination does not have a defined meaning" or something like that.
See my recent comments for more views. I support the idea to be
normative and specific when possible.
A complication is that there is no difference between language
tags for written and spoken language.
So we have the following possible combinations and interpretations
of "silly states"
1. Spoken/written tag in video media, can mean to see a speaking
person, or to provide captions overlayed on video.
With some hesitation I suggest to let it mean to see a speaking
person. The draft adds a requirement to have the same language in
the audio stream in the same direction to have that
interpretation. Should that mean that if there is another
language in the audio stream, then the spoken/written tag in the
video stream should mean captions in the specified language? That
sounds useful for some cases, but complex to interpret and unfair
to the users who would benefit from captions in the same language
as in audio.
Summary: I think we had better to use the interpretation to see a
speaking person regardless of what language is indicated for audio.
2. Signed language tag in audio media, can mean audio from a
signing person. That could be anything between near silence and
spoken words corresponding to the signed signs as far as feasible.
This is usually seen as disturbing to sign language users but it
exists, e.g. when one erson needs to communicate with both hearing
and deaf persons simultaneously. There are also variants of
signing, called sign supported language, with signs expressed with
spoken language word order and grammar. That can more easily be
combined with spoken language, but would more likely be indicated
by spoken language tag in audio media.
Summary: I am inclined to let signed language tag in audio media
mean audio from the signing person and possibly used for the rare
cases when it has some relevance for language communication.
3. Sign language tag in text media. There are some ways to
represent sign language in various kinds of symbol or text
representation. Some are represented in Unicode. One is a system
called Sign Writing. Some fingerspelling methods also have fonts
corresponding to characters in code pages. There is also an
informal way to write manuscripts for signing in words with
capitals approximately corresponding to signs, often with some
notation added for unique sign language ways of expression that
has no direct correspondance to words. None of these systems above
are common in real-time conversation, but I have seen examples of
such use.
Summary: I think we can leave freedom here and just specify that a
sign language tag in text media means some representation of sign
language or a corresponding fingerspelling system in text media.
If these conclusions are accepted, we can formulate modified text.
Note that the case with spoken/written language tag in video media
is mentioned in two places in the draft.
Regards
Gunnar
_______________________________________________
SLIM mailing list
SLIM(_at_)ietf(_dot_)org <mailto:SLIM(_at_)ietf(_dot_)org>
https://www.ietf.org/mailman/listinfo/slim
<https://www.ietf.org/mailman/listinfo/slim>
--
-----------------------------------------
Gunnar Hellström
Omnitor
gunnar(_dot_)hellstrom(_at_)omnitor(_dot_)se
<mailto:gunnar(_dot_)hellstrom(_at_)omnitor(_dot_)se>
+46 708 204 288
_______________________________________________
SLIM mailing list
SLIM(_at_)ietf(_dot_)org
https://www.ietf.org/mailman/listinfo/slim
--
-----------------------------------------
Gunnar Hellström
Omnitor
gunnar(_dot_)hellstrom(_at_)omnitor(_dot_)se
+46 708 204 288