On 25.11.2003 (13:09 Uhr +0530), Karthikeyan Ramnath wrote:
thanks dude, thats what I'm doing(more or less). I've stripped all the font
and formatting info and converted the simple text into a basic XML doc which
specifies the X,Y position for each element.
Next I hope to define a transform which will convert the absolute
coordinates into more meaningful numbers like the column indices of a table
etc...
Any inputs in this regard will be more than welcome.
If you successfully extracted positioning data from the PostScript, you are
relying on a very special format, because in the real world this data can be
hidden in unlimited ways. The same is true for string data: it can be encoded
and even in the simplest environment non-ASCII characters will be octal coded
numbers. Next: The way those numbers relate to certain glyphs (= letters of a
given font) depends on the encoding vector, which can be set up (= programmed)
in multiple ways as well...
Why can't you go back one step and work with the source of the PostScript
files, or use that source to include machine parsable data (like comments) in
the PostScript data?
- Michael
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list