xsl-list
[Top] [All Lists]

Re: [xsl] library for parsing RTF

2010-06-29 14:51:25
On Sun, Jun 27, 2010 at 16:07, Andriy Gerasika
<andriy(_dot_)gerasika(_at_)gmail(_dot_)com> wrote:

For a language as rich as RTF, regular expressions are not going to get
you all that far: they are probably only suitable for writing the
lexical analyzer (or tokenizer).


RTF syntax is not that complex for requiring BNF parser.

assuming the following RTF:
{\rtf1\ansi{\fonttbl\f0\fswiss Helvetica;}\f0\pard
This is some {\b bold} text.\par
}

it can be easily converted w/ regular expressions to something like:
<g><rtf>1</rtf><ansi/><g><fonttbl/><f>0</f><fswiss/>Helvetica<sc/></g><f>0</f><pard/>
This is some <g><b/>bold</g> text.<par/>
</g>

where "g" equals to RTF's curly braces(group) and "sc" to semicolon in RTF.

not sure if BNF parser will produce something better...

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: 
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--


This seems about as useful as a regex C compiler, that compiles

   main() { printf ("Hello world!\n"); }

and _nothing_ else.

Just because you can make an regex for _one instanace_ of a grammer
does not mean that you can (easily) use regexs to parse a generic
format.  RTF is generic - there are MANY valid ways to say similiar
things in RTF.

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--

<Prev in Thread] Current Thread [Next in Thread>