Re: DocBook to plain text - what do you use?

At 11:42 -0400 7/28/04, Wendell Piez wrote:

At 08:47 AM 7/28/2004, Paul wrote:
I agree that it seems like it should be much easier.  That's one reason
I'm puzzled that such a thing doesn't seem to exist.
If you think about, for example, the way interpolations of inlinepseudo-markup (like *this* for emphasis) and similar constructs willaffect, for example, line wrapping, particularly since
  Some blocks need to get indented like this: it is several lines
  long, and is required to *wrap* nicely, no matter what might turn
  up in it -- requiring the smart introduction of whitespace both at
  line ends and at line starts (and maybe the extent of the indent
  varies as well) --
then it is apparent that creating "pretty plain text" is not astrivial as it may first appear.


Sure.  I'm still surprised at how little it seems to have been attempted.
Or maybe people attempted and gave up! :-)

My guess is that the graceful XSLT-only solution will require two orthree passes over the data.
Another sad fact of life is that one person's pretty plain text isanother's ugly stepsister.
Is it just that no one is interested in producing plain text?  (For example,
to produce README files and such from a distribution's general DocBook
documentation sources?)  Or is the need little enough that lynx -dump
is good enough for people's purposes?
It seems to be one of those problems that is *nearly* general enoughfor a generic solution, but that has hidden gotchas and localparticularities that have hindered the development of aone-size-fits-all solution.
Here's an article about an approach that uses Java (SAX) for thefinal stage of production of the plain text:http://www-106.ibm.com/developerworks/java/library/x-xmlist1/. Soit's not that this problem hasn't come up before. (Not too long agothe list even discussed producing plain-text tables from XML -- areal beast.)


Thanks, I'll have a look.