nmh-workers
[Top] [All Lists]

Re: [Nmh-workers] nmh architecture discussion: format engine character set

2015-08-12 14:34:29
On Wed, Aug 12, 2015 at 1:07 PM, Ken Hornstein wrote:

It appears the basic processing model is a pipeline:

 Raw -> [Encoder] -> UTF8 -> [Processor] -> UTF8 -> [Encoder] -> Output
...
We're going to a point where UTF-8 is going to appear in email
addresses.  That's technically allowed today under the new RFCs.  The
problem then becomes "Okay, 'Output' in the above stage needs to be
'Input' when doing message replies.  How, exactly, do we do that?"

I see the Processor as nmh application logic: it will always operate in
the UTF8 realm.  The Encoders are basically I/O filters that are applied
at the input and output stages.

Take the reply command.  The first thing it needs to do is read the
original email data to generate the draft template for editing.  The
initial read operation is filtered thru the Encoder first.  The result
is passed into the nmh engine to parse header fields and other jazz to
create the draft message (all of this is done in the UTF8 world).  When
writing the draft, the data is piped thru the encoder then written to
disk before launching the editor (hopefully it is a no-op, but if in a
non-UTF8 locale...).

After editing, the draft is now the "Raw" input, repeating the pipeline
again for whatever nmh is instructed to do with the draft.

I know this may illicit some groans, but I work with Java daily.
Internally, all strings are Unicode (technically it is not, but the
difference is irrelevant for this discussion).  It is the job of the I/O
readers and writers to deal with conversion to and from non-Unicode to
Unicode encodings.  I.e.  Before my application logic can do anything
with textual information, it gets "converted" by a Reader, and the app
only then deals with Unicode characters.  When I write output, the
Writer than converts to whatever the destination encoding is.

Perl even supports a similar model of its I/O streams (if you choose to
use it).

--ewh

P.S.  Things may be a bit more complicated when dealing with MIME entity
parsing, where each entity could be in a different encoding.  In that
case, each entity would have to be passed thru the encoder for
normalization.

_______________________________________________
Nmh-workers mailing list
Nmh-workers(_at_)nongnu(_dot_)org
https://lists.nongnu.org/mailman/listinfo/nmh-workers

<Prev in Thread] Current Thread [Next in Thread>