xsl-list
[Top] [All Lists]

Re: parsing post script

2003-11-25 07:38:13

ghostscript includes a pstext utility to extract text: it does a
reasonable but not 100% accurate job (and includes the full ghostscript
postscript interpreter).

If you turn off the ps2ascii simple mode (remove the "-dSIMPLE" argument), GhostScript outputs font and positioning information for each string. You can use that information to eliminate headers & footers, identify elements
to tag, and so forth.

Exegenix (http://exegenix.com/) has a commercial solution for converting
PostScript or PDF to XML; it looks intriguing.

--
Larry Kollar    k  o  l  l  a  r  @  a  l  l  t  e  l  .  n  e  t
"The hardest part of all this is the part that requires thinking."
-- Paul Tyson, on xml-doc

XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list



<Prev in Thread] Current Thread [Next in Thread>