xsl-list
[Top] [All Lists]

Re: [xsl] Grouping of text input file lines

2013-08-11 12:45:07
I've generally done this using your second approach: convert each line to an 
element and then use group-starting-with to group them.

In XSLT 3.0 we're allowing patterns to match atomic values, so you can do 
group-starting-with on a sequence of strings.

Michael Kay
Saxonica

On 11 Aug 2013, at 15:46, Wolfgang Laun wrote:

I'll briefly describe the problem and outline two approaches to a
solution. I'd be pleased to receive a comment or two.

The task is to convert a plain text file to XML using XSLT 2.0. The
text file contains lines, all according to
 tag: value
and these lines are grouped at three levels: "database", "relation"
and "field", where each entity has some options and one or more
children of the lower level (except for field, of course).

Example, indentation according to nesting level:

node: abc    # a DB option
key: CMOS   # a DB option
rel: rlo_one
 com: a relation # a relation option
 alg: direct         # a relation option
 ele: fa int
   com: blurb       # element (field) options
   def: 0
   acc: px
   acc: py
 ele: fb chars
   com: bla bla
   def: "----"
   alg: permute
 num: 100          # a relation option
rel: rlo_two
 com: another relation    # a relation option
 com: more comment
 com: yet more comment
 ele: fx int
   com: blurb
   def: 0
   acc: px
 ele: fy int
   com: bla bla
   def: 42
 num: 50                   # a relation option

The expected XML structure is obvious, I think: a sequence of DB
options and relation elements; these contain relation options and
field elements, which contain field options. Field order must not be
changed. "com" entries should be joined while observing line breaks,
and "acc" entries too, but joined with a space.

The first basic idea I used throughout is to maintain another string
sequence in parallel to the one containing the text lines. That
sequence contains just the tags, so that index-of can be used to
compute "interesting" line numbers. This way, subsequences of lines
for all or individual relations and fields can be conveniently
extracted.

The second idea is to use grouping. The sequence of lines is converted
to a sequence of nodes <tag>value</tag> and a nested
group-starting-with separates relations and fields - almost. As you
can see, there's some leading lines defining DB options, and each
relation contains option lines before and after the element groups.
Most likely, cherry-picking lines and line groups prior to the
glorious for-each-group has to be done using the technique described
above.

Any better ideas?
Thanks

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: 
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--



--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--


<Prev in Thread] Current Thread [Next in Thread>