xsl-list
[Top] [All Lists]

Re: [xsl] Find inconsistencies: Perl or XSLT?

2010-12-01 12:01:42
Perhaps I am missing something here, but for this simple problem XSLT 1.0
end even XPATH 1.0 seems to be good enough.


Problem:
identify duplicate source entries of unit elements


Input tags did not match, find corrected input.xml below.


If input file size is moderate this simple XPATH statement will do it:

$ xpath++ "/data/unit[source=following-sibling::unit/source]" input.xml

===============================================================================
<unit id="1">
    <source>blabla</source>
    <target>plapla</target>
</unit>
===============================================================================
<unit id="2">
    <source>bleble</source>
    <target>pleple</target>
</unit>
$


Now in case of bigger files to process making use of key() function helps:

$ cat dupsrc.xsl
<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform";

  <xsl:key name="source" match="node()" use="source"/>

  <xsl:template match="text()"/>

  <xsl:template match="/data/unit[count(key('source',source))>1]">
    <xsl:value-of select="concat(@id,'-',source,'&#10;')"/>
  </xsl:template>

</xsl:stylesheet>
$
$ xsltproc dupsrc.xsl input.xml
<?xml version="1.0"?>
1-blabla
2-bleble
4-blabla
5-bleble

$ cat input.xml
<data>
<unit id="1">
    <source>blabla</source>
    <target>plapla</target>
</unit>
<unit id="2">
    <source>bleble</source>
    <target>pleple</target>
</unit>
<unit id="3">
    <source>bloblo</source>
    <target>ploplo</target>
</unit>
<unit id="4">
    <source>blabla</source>
    <target>plapla</target>
</unit>
<unit id="5">
    <source>bleble</source>
    <target>lolailo</target>
</unit>
</data>
$


Mit besten Gruessen / Best wishes,

Hermann Stamm-Wilbrandt
Developer, XML Compiler, L3
Fixpack team lead
WebSphere DataPower SOA Appliances
----------------------------------------------------------------------
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter
Geschaeftsfuehrung: Dirk Wittkopp
Sitz der Gesellschaft: Boeblingen
Registergericht: Amtsgericht Stuttgart, HRB 243294



From:       Michael Kay <mike(_at_)saxonica(_dot_)com>
To:         xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com
Date:       12/01/2010 04:06 PM
Subject:    Re: [xsl] Find inconsistencies: Perl or XSLT?



On 01/12/2010 14:46, Manuel Souto Pico wrote:
Dear all,

I need to process some files and I know how to do it in Perl, but as
has happened to be the case in the past with other stuff, perhaps
there's a (objectively) simpler or more efficient way to do it with
XSLT.

I have a file like this

<unit id="1">
    <source>blabla</source>
    <target>plapla</source>
</unit>
<unit id="2">
    <source>bleble</source>
    <target>pleple</source>
</unit>
<unit id="3">
    <source>bloblo</source>
    <target>ploplo</source>
</unit>
<unit id="4">
    <source>blabla</source>
    <target>plapla</source>
</unit>
<unit id="5">
    <source>bleble</source>
    <target>lolailo</source>
</unit>

I think the example is illustrative enough.

The target element contains the translation of the source element, and
one same element must always be translated in the same way, but
sometimes it's not. So what I'd to do is find two or more units with
the same source but with different target (like 2 and 5 in the
example, but unlike 1 and 4).

In Perl I would use a XML module (or not) and put the source elements
in the keys of a hash and the target elements in their corresponding
values. When assigning a new key-value pair, if the key already
exists, I compare the values. If they are equal, they pass, else they
are flagged and included in the report.

The report in this case would be something like:

The following inconsitencies have been found
2: bleble ->  pleple
5: bleble ->  lolailo

Is it possible to do this in XSLT? Is it more efficient that doing it
in Perl as I was planning to? I knowledge of XSLT is very limited and
I can't see beyond transforming a XML file into another XML file.

Thanks a lot for your opinion.
Manuel

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or 
e-mail:<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--


Something like this:

<xsl:for-each-group select="unit" group-by="source">
<xsl:if test="count(distinct-values(current-group()/target)) gt 1">
<conflicts-for source="{current-grouping-key()}">
<xsl:value-of select="distinct-values(current-group()/target)"/>
</conflicts>
</xsl:if>
</xsl:for-each-group>

Michael Kay
Saxonica

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--




--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--

<Prev in Thread] Current Thread [Next in Thread>