Re: MIME Clarifications STILL wanted

Some of the Rich Text 'verbs' have subtle semantic ambiguities when 
self-nested.
For example, should <underline><underline> be used to delineate a double
underline?  <Bold><Bold> might be usefull in a context where degrees of 
boldness
are possible.  <Italic><Italic> probably is nonesense, but I am not willing to
bet the farm on that.


I'm going to take a stab at this. Please note that I don't regard richtext
as exactly my forte, but this happens to be a question that interests me.

First of all, a check with the SGML standards didn't turn up any information
about this as far as I could see. The effect of nesting seems to be entirely
up to the DTD; I could not find anything that indicates there's some sort
of general abstract meaning associated with it.

I don't claim to be anyhing approaching an expert on SGML, but I had the
documents handy so I thought checking them was a good idea. If some SGML expert
wants to put an oar in here I'd be interested in hearing more about the
abstract connotations of nesting in SGML.

This being said, I think it is interesting to consider these one at a time and
think a little about it from a common sense perspective.

(1) Underlining. If I have some underlined text that I need to further
    underline (this actually happens in some notational systems for
    certain types of mathematics) I think the logical thing to do is
    underline it twice. In other words, underlining is cumulative and
    doesn't cancel itself out. Now, there's a lot of equipment that can
    do underlining that cannot do double underlining (the VT320 I'm using
    now is a good example). I think the best that can be done in this case
    is to just underline once no matter how many nested levels of underlining
    are in effect.

(2) Bolding is more or less equivalent to underlining, except that the ability
    to double-bold is probably somewhat more common. (Things like demibold
    fonts are often available.)

(3) Italics is another matter. The conventions most often used with nested
    italics is to switch back into the original font. I have seen this
    nested up to 4 levels in some publications; nesting of 2 levels is
    very common indeed. It is also rare to find that more than one italic
    font is available and appropriate; in practice italics can stretch
    readability fairly far and double italics would be simply too much.

    Now, there's nothing that prevents documents from doing this sort of
    thing without ever resorting to nested italics. Or is there? It does
    mean that if you're including one richtext segment inside of another
    that has italics enabled you'll have to scan it and flip the italics
    settings around throughout. I don't think this is desireable.

    I therefore think common sense dictates that italics be a toggle
    (each nesting flips it). If an implementation has more than one
    italic font available it should feel free to use it -- it can even
    cycle through all the italic variations available before arriving
    back at the original font.

I quite frankly don't feel strongly about any of this, and if this group
decides a different interpretation is appropriate I will not have a problem
with it.

My concern is not with these, which are admittedly a bit
exotic, but rather with the verbs Smaller and Bigger, where non-trivial
documents will routinely have to deal with this hierarchical nesting.

Consider the following text:

               <12>...<6>...<9>...<12>

I can see 2 alternative RT representations, assuming 12pt was 'normal':

        ...<Smaller><Smaller>...</Smaller>...</Smaller>...

and

        ...<Smaller><Smaller>...</Smaller></Smaller> <Smaller>...</Smaller>...

The first is more concise, and probably more elegant, but it is a bit of a 
pain
to support, as it requires a hierarchical history.

The second can be implemented by a simpler parser, which need only track the
most recent 'on' position, rather than a list of them, and which would also
track the depth (to use as an index into the users prefered display sizes
list).


I don't think your parser really has an option here; I think it has to remember
how many <smaller>'s are in effect and act accordingly. The overhead associated
with this is very small as far as I can see.

This means that the burden is on the parse to behave the same way regardless of
its input. Any application generating this stuff can then use either form.

A more interesting question is one of differential sizes. For example, suppose
I have the following:

       .A.<Smaller><Larger>.B.</Larger></Smaller>

There are two possible results. One possible definition is "smaller reduces
size by 2 points; larger increases it by 2 points". In this case ".A." and
".B." are the same size. Another possibility is "smaller reduces size by 20%
off the original size; larger increases it by 20% added to the original size".
Assuming you start with 10 point type, you end up with .B. in 9.6 point type
using this definition and not 10 point type.

In practice this may or may not be an issue, but we probably should tie off
this loose end, if only with a discussion. I have opinion at all about which is
correct; the relative percentage model is the one I see more often practice but
I have no special fondness for it.

Mailstrom can go either way on this, I am asking you (all) for your sense of
what would constitute a proper interpretation of rfc1341.


My sense is that your parser must produce the same results for either input
so your generator should feel free to use either one.

Everyone seems to be ducking it.


Not true. Nathaniel answered you right away with this:

This is certainly a point that should be clarified in the next version of the
spec.  My own interpretation of it has always been that multiple smaller
or bigger tokens should indeed have a differential effect, but it never
even occurred to me to think that multiple underline tokens would imply
double underlining.  Since double underlining is probably not even possible
on lots of systems (e.g. video terminals), I'm inclined to say this is not
the case.


I can certainly live with this position. The only case I have some
reservations about is nested italics.

As for the two different ways to represent text that goes from large to
small to inbetween and back to large, they both look fine to me.  I would
think that a good richtext parser should be able to display either one,
and that it is acceptable to generate either one as well.  -- Nathaniel


I agree completely with Nathaniel's statement on this.

So there you have it -- two people have now expressed opinions on this issue. 

It occurs to me, however, that you seem to be expected some sort of ruling on
this point and an absolute answer in the short term. This simply isn't the way
the process works. When an issue like this is raised we discuss it on the
mailing list. There may be as few as one message on simple and noncontroversial
issues. Or there may be lots. Nathaniel and I as authors then try to distill
out some consensus from the discussion (or lack of discussion). We feed this
back into the document. Eventually a new draft appears and if we got it wrong
people jump on us. But the key point is that the process takes time and
you cannot expect to get an absolute ruling right away.

In this particular case Nathaniel expressed his opinion on how things worked
and as far as I can tell nobody has objected to it. (There may be strong
objections to richtext as an entity but this is a separate matter.) I have now
jumped in and spoken my piece. You may now decide that you have enough
information to implement. But it is entirely possible that somebody will show
up at the next IETF and convince the entire Working Group (us included) that
we're dead wrong and it will get changed. It has happened before. The price you
pay for an open standards process with significant implementation feedback is
that you may do it wrong and have to change something later. But the net result
is that we get tested and interoperable standards, and this seems to be a pretty
small price to pay for such a result.

                                Ned