Your solution looks perfect and appears to work perfectly for ASCII-based XML input examples like the following
<?xml version="1.0" encoding="UTF-8"?>
<foo>
<bar>this is just <word correction="too">to</word> funny</bar>
</foo>
yielding the correct/desired output
<?xml version="1.0" encoding="UTF-8"?>
<foo>
<bar>this is just <word origerror="to">too</word> funny</bar>
</foo>
I now see that some of my own attempts also worked, on the same ASCII-based example.
***** Suspected bug involving supplementary characters *****
But my real task involves an input XML document, in UTF-8 encoding, that consists of Deseret Alphabet characters, which are encoded in the supplementary area. In such a case, the resulting text content in the <word> element, copied from an original attribute value, is corrupted. I saw such corruption in my own attempts, and couldn’t understand what was happening.
Using the following input document (the Deseret Alphabet characters may not display correctly for you)
<?xml version="1.0" encoding="UTF-8"?>
<foo>
<bar>𐑄𐐮𐑅 𐐮𐑆 𐐾𐐲𐑅𐐻 <word correction="𐐻𐐭">𐑂𐐯𐑉𐐮</word> 𐑁𐐲𐑌𐐮</bar>
</foo>
the output, using your script, is corrupted. The text() value in the output is not the same as the original @correction value. Extra characters (just one in this case) are inserted. The longer the original attribute value, the more extra characters are inserted.
<?xml version="1.0" encoding="UTF-8"?>
<foo>
<bar>𐑄𐐮𐑅 𐐮𐑆 𐐾𐐲𐑅𐐻 <word origerror="𐑂𐐯𐑉𐐮">𐐻𐐻𐐭</word> 𐑁𐐲𐑌𐐮</bar>
</foo>
This kind of corruption is exactly what I was seeing using my own scripts, leading me to bother the group.
I suspect a bug in the XSLT engine involving supplementary characters. Again, I’m using SaxonHE9-7-0-6J.
What’s my next step?
Thanks,
Ken
********************************