xsl-list
[Top] [All Lists]

Re: [xsl] is there a way to hash an element?

2016-06-10 04:30:28
If you have the opportunity to use XSLT 3.0, this might be a good use for 
accumulators; these visit every node in the tree and compute a value based on 
the previous value and the content of the node: in your case the value of the 
accumulator could be the hash function. With 2.0 you could achieve a similar 
effect using apply-templates with sibling recursion.

Something like this:

<xsl:template match="*" mode="hash" as="xs:integer">
  <xsl:param name="h" as="xs:integer"/>
  <xsl:apply-templates select="." mode="local-hash">
    <xsl:with-param name="h">
      <xsl:apply-templates select="following-sibling::*[1]">
        <xsl:with-param name="h">
          <xsl:apply-templates select="*[1]">
            <xsl:with-param name="h" select="$h"/>
          </
        </
      </
  </
</

and then in mode local-hash, you can define rules for individual elements that 
compute a hash for that particular element based on its attributes and text 
content; each template takes the old hash value and updates it as necessary. To 
combine two hash values you can use addition, or if you prefer an XOR function 
which you can get from the EXPath binary library.

Michael Kay
Saxonica
  

On 9 Jun 2016, at 23:09, Graydon graydon(_at_)marost(_dot_)ca 
<xsl-list-service(_at_)lists(_dot_)mulberrytech(_dot_)com> wrote:

Hello all --

So I've got about half a gibabyte of XML messages describing various
health care actions.  Many of these are structural duplicates of each
other; the top elements differ by their attribute values, but the
structure and values of the descendant elements is the same.  The amount
of duplication varies from none to thousands.

I've got an apparently useful heuristic based on descendant attribute
values, but would -- it is health care data -- really like to have a
more robust way to group the elements into set of equivalent top-level
names by their structural sameness.  (I can't hand-check the whole data
set.)

So I find myself wanting an equivalent of sha256sum for elements so I
could generate a grouping key from the descendant elements and their
associated attributes as a unit.

Is there such a thing?  Equivalent approaches?

Thanks!
Graydon

--~----------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
EasyUnsubscribe: http://lists.mulberrytech.com/unsub/xsl-list/1167547
or by email: xsl-list-unsub(_at_)lists(_dot_)mulberrytech(_dot_)com
--~--

<Prev in Thread] Current Thread [Next in Thread>