Re: grouping + global variable (?) (was re: regexs, grouping (?) and

Hi Deirdre,

I do hope this thread provokes others to contribute.

At 11:05 PM 8/13/2004, you wrote:

Quite frankly, I hadn't realized we were so cutting edge. :)

Well up-conversion has been going on along as e-text has. But XML is onlynow beginning to catch up to where Omnimark and even Perl were years ago.And to do it in public and share is a new thing, since it's both difficultand profitable enough that it is best understood by the data-conversionvendors, who know a lot about it but whose methods and technologies tend tobe proprietary for understandable reasons (it's their bread and butter).

Also, it is a tough enough business in the general case that often it'sdealt with by throwing people at it, not machines, or both together. Inmany cases when input is underspecified or unconstrainable this cannot beavoided.

Ultimately, my goal is to provide an application that offers integration
between the text file (written using the user's text processor of choice).


Yes: understood.

User wants to submit a manuscript, then the application performs all the
necessary generation of the document (including cover letter) using
user-specific information about how they want the document to appear,
including any market- or genre-specific styles. Press a button, out pops
the PDF or RTF. For now, I'll settle for PDF. :)

Good choice. The long-term challenge of these systems is "round-tripping"but you may want to avoid that for now; an opaque format like PDF helpsforce users to edit their original input, not the system's output (whichthen has to become input again).

I didn't write the perl script, thus my frustration (as a Python person).
My partner-in-crime and I have come at the problem from entirely different
directions.


This can be useful.

> Now it has some regexp support, XSLT 2.0 should be at least a credible
> option here, but its features have yet to be stress-tested TMK and
> tools support is still somewhat up in the air. (I believe Mike Kay is
> speaking on this very topic at XML 2004 this November in Washington
> DC.)

OK, that's what I'd been beginning to understnad based on list comments. I
wasn't aware of the tool support problem.

Saxon 8 is available but other vendors are standing in the wings (wherethey're hard to see). Only when we have a range of tools will it becomeclear (IMHO) how well the spec is designed. (For example the fact that W3CXML Schema implementations differ on details of implementation compromisesthe use of Schema generally, since its portability is impaired. This is ashame, though getting the spec right the first time in every detail onsomething like Schema is near-impossible; over time we can hope thissituation will improve.)

> A split-down-the-middle option could be to write a little function
> library in the language of your choice to do the upconversion
> string-processing, and call out to it from your XSLT using extension
> functions. (This is what I kind of imagined would happen five years
> ago, but it turns out processor-dependent extension functions are
> unfashionable these days.)

This is an intriguing option.

For this to work the text has to start life as some kind of XML, thoughthat could be nothing but a dumb wrapper. Then you'd need a processor whoseAPI allows you to return node-sets from functions.

Also, don't forget that XSLT 2 gives user-defined functions, so for manythings it may be possible to avoid the external language altogether.

99% of the problem comes from documents saved in the native platform that
aren't correctly tagged. I'm not quite certain what to do about this so
that the editing is transparent. Yet.

I think this is the most difficult problem. This is why XML'swell-formedness rules constitute its secret weapon. (Felt only when theychafe, this set of rules makes all downstream issues much easier to dealwith, so XML developers can be quite unconscious of how much we don't haveto think about.)

You need a way to trap and fix bad incoming tagging before it gets intoyour system, where it's expensive to deal with.

A plain-text editing window is appealing (many writers like theirkeyboards), but you're going to need at least a "galley" preview on input,before commit, or you're going to go insane. A real grammar for your syntaxwould be even better.

I feel moderately confident that this might make it a more contiguous
process, which would also require fewer installed pieces in order to work.


Yes.

> I'd be interested to hear myself from the list on this question. I haven't
> yet myself seen a really nice route to RTF. I think two passes to this
> (analogous to the way IBM deployed a "TeXML" which could be targeted as a
> route to TeX) might be the best way to do it: have yet another tag set that
> describes only the formatting primitives supported by RTF and a utility
> stylesheet to make RTF out of that. Or use XSL-FO, if any of the formatters
> can make decent RTF yet.

jfor hasn't been updated at all in over a year, so it seems like a dead
project. And jfor.org is down.

An indication that the problem of generating nice RTF is harder than it mayfirst appear.

I should add that I *do* need API access rather than a standalone
application.

If it were me I'd be inclined to see how far I could go with XSLT 2. Butthen, I like XSLT. I am actually fairly hopeful that XSLT 2 processors willbe strong contenders in this space.


Cheers,
Wendell

___&&__&_&___&_&__&&&__&_&__&__&&____&&_&___&__&_&&_____&__&__&&_____&_&&_
    "Thus I make my own use of the telegraph, without consulting
     the directors, like the sparrows, which I perceive use it

extensively for a perch." -- Thoreau

Re: grouping + global variable (?) (was re: regexs, grouping (?) and XSLT2?)