Well, if you REALLY want open issues no matter how unresolvable, you might
as well add the entire boundary marking scheme. It's just too complex.
Reading through the entire message (possibly more than once) to make suer
that your chosen string is unique is not reasonable in my view.
I will only dignify this old gremlin with an answer once. It is not necessary
to read through the entire message to know with reasonable confidence that
your marker is unique. This has been discussed on the list previously; please
read those postings. Either challenge that probabilistic approach as being
unworkable or stop making this objection.
I now have significant operational experience with the boundary marker
approach in production software and I have had no problems with it at all.
I happen to like probabilistic algorithms. I use them every day and so do
you if you have an Ethernet. (I suppose I could get even wierder and ask
if you believe in quantum mechanics or perhaps the phone system, but I'll skip
that.) However, I know some people don't believe in probabilistic approaches,
so...
You can also generate a guaranteed unique boundary marker in a SINGLE pass
using storage bounded by the message size and time bounded by the
O(NlogN), where N is message size. I think such an approach is gross overkill,
but it is definitely doable. A second pass is NEVER necessary if you use
the proper algorithm.
In case you don't believe this, here's how. Start with any quality algorithm
for storing and retrieving sparse sets of large integers. You can put a
bound on the largest integer of, say, 2^64. This is then a bound on the largest
message you can guarantee to process, but I don't think that you will see
messages larger than this.
Plenty such algorithms are available that take time bounded by O(MlogM) in the
worst case, and space bounded by O(M) worst case, where M is the number of
integers stored. See Sedgewick's algorithms if you want some example
algorithmic building blocks you can use to build this sort of thing. I
recommend fibonacci heaps, but there are lots of approaches.
Now scan the message, and for any line that begins with a -- and is followed
by an integer that falls within the range allowed by your algorithm, store
it.
Once processing is complete simply scan the set and find an integer that did
not appear. The ONLY time this can possibly fail is if the message contained
all 2^64 integers you can represent. In order for a message to meet this
criterion it would have to be pretty large. Thus, for bounded message
size there exists an algorithm that will produce a guaranteed unique
marker in bounded time.
You are welcome to poke a hole in this if you can.
[I'd rather count lines in 1154! After all the only real problem with lines
is that some older mailers wrap illegally. I suspect those mailers will get
fixed as they get upgraded to 8-bit SMTP... ]
Many of the offending MTAs are already 8-bit capable. There is no need to
upgrade them, and even if the future of SMTP provides us an upgrade (which
is FAR from evident at this point) we cannot possibly rely on this for the
purposes of RFC-XXXX.
Secondly, put the fact that the encoding description is scattered among
lots of parallel attributes when (I think) it should be a description of
a set of transformations (each of which might have options).
I am totally in agreement with Nathaniel on this one. I was originally
a believer in the pipe approach, but Nathaniel and others have convinced
me that the additional complexity of sets of transformations is nothing
less than an invitation to disaster. I still think of it as a set of pipes,
but that's not what I want to see in the standard.
Ned