xsl-list
[Top] [All Lists]

Re: [xsl] tricky string matching

2011-03-14 04:20:42
Another approach:

- in every element that contains the 'tief' element:
- use analyze-string to replace WS chars with an element (let's call it 'ws')
  - in a second pass, group starting with ws, e.g.,
    <ws string=" ">CO<tief>2</tief> =>  <word>CO<tief>2</tief></word>
- in a third pass, replace word/tief with <alias kw="{word}"> and word/node() as content
  - in the same pass, dissolve word without tief to plain text

Gerrit

On 2011-03-14 09:52, Szabo, Patrick (LNG-VIE) wrote:
Hi,

I'm using XSLT 2 and Saxon 9

Example-snippet from my input:

...
<absatz>text text text text text text text text CO<tief>2</tief>  text
text text text text text</absatz>
<absatz>text text text text text text text text H<tief>2</tief>O text
text text text text text</absatz>
...

What i have to do is make this look like this:

...
<absatz>text text text text<alias kw="CO2">CO<tief>2</tief></alias>
text text text text text text</absatz>
<absatz>text text text text<alias kw="H2O">H<tief>2</tief>O</alias>
text text text text text text</absatz>
...

I do have an idea on how to solve this problem but it sounds very
inefficient to me.

What would you suggest ?!

I would compile a list with alle the possible "Strings" like

...
CO2
H2O
...

Then i would make the absatz flat so there are no<tief>  anymore.
After that i would tokenize all the text() and see if one of them
matches an entry of my list.

Is  there a better way ?!

Kind regards

. . . . . . . . . . . . . . . . . . . . . . . . . .
Patrick Szabo
  XSLT Developer
LexisNexis
Marxergasse 25, 1030 Wien

mailto:patrick(_dot_)szabo(_at_)lexisnexis(_dot_)at
Tel.: +43 (1) 534 52 - 1573
Fax: +43 (1) 534 52 - 146






--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail:<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--


--
Gerrit Imsieke
Geschäftsführer / Managing Director
le-tex publishing services GmbH
Weissenfelser Str. 84, 04229 Leipzig, Germany
Phone +49 341 355356 110, Fax +49 341 355356 510
gerrit(_dot_)imsieke(_at_)le-tex(_dot_)de, http://www.le-tex.de

Registergericht / Commercial Register: Amtsgericht Leipzig
Registernummer / Registration Number: HRB 24930

Geschäftsführer: Gerrit Imsieke, Svea Jelonek,
Thomas Schmidt, Dr. Reinhard Vöckler

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--

<Prev in Thread] Current Thread [Next in Thread>