That sounds like an interesting problem. If the English->Klingon 
translator would leave a trace of what translated to what, then it might 
be feasible (though difficult) to reconstruct the inline markup. Failing 
that, it seems nigh impossible. But that's assuming the document uses 
inline markup (which you didn't explicitly specify). If it's a matter of 
just getting different sections back in place, then you'd probably make 
multiple calls out to the translator, one for each blob of text. Of 
course, I suppose you could try the same for inline markup. It just 
might come out reading a bit funny and disconnected (but I suppose 
that's to be expected from an automatic translator anyway...).
<p>Hello this is <strong>bold</strong>. This is <em>italic</em>.</p>
You could call the translator for each non-whitespace-only text node in 
the document.
<xsl:template match="/">
 <translator-inputs>
   <xsl:apply-templates/>
 </translator-inputs>
</xsl:template>
<!-- Ignore whitespace-only text -->
<xsl:template match="text()"/>
<xsl:template match="text()[normalize-space()]">
 <to-translator>
   <xsl:copy/>
 </to-translator>
</xsl:template>
For the above document, that would yield:
<translator-inputs>
 <to-translator>Hello this is </to-translator>
 <to-translator>bold</to-translator>
 <to-translator>. This is </to-translator>
 <to-translator>italic</to-translator>
 <to-translator>.</to-translator>
</translator-inputs>
This reveals a further requirement: strip out and reconstruct 
punctuation that lies at the edges of a text blob (and that the 
translator would likely ignore anyway). You could do this using regular 
expressions. I'm not going to trouble myself with that right now, but 
the result might look like this:
<translator-inputs>
 <to-translator>Hello this is </to-translator>
 <to-translator>bold</to-translator>
 <to-translator sentence-boundary="yes>This is </to-translator>
 <to-translator>italic</to-translator>
 <to-translator sentence-boundary="yes"/>
</translator-inputs>
I wouldn't worry about commas so much, or even periods in the middle of 
a blob of text. Theoretically, the translator will take care of those. 
It's only when we chop up text near the sentence boundaries (due to 
inline markup, e.g., a <b> tag) that we'd have to worry about that.
Then you'd hope to construct a result like this with help from the 
translator:
<results>
 <from-translator>Olleh siht si </from-translator>
 <from-translator>dlob</from-translator>
 <from-translator sentence-boundary="yes">Siht si </from-translator>
 <from-translator>cilati</from-translator>
 <from-translator sentence-boundary="yes"/>
</results>
Reconstructing the document, you'd run another transformation against 
the original document, changing only the non-whitespace-only text nodes:
<!-- By default, copy everything unchanged. -->
<xsl:template match="@* | node()">
 <xsl:copy>
   <xsl:apply-templates select="@* | node()"/>
 </xsl:copy>
</xsl:template>
<!-- But replace non-whitespace-only text nodes with their translated 
counterparts. -->
<xsl:template match="text()[normalize-space()]">
 <xsl:variable name="text-node-position">
   <xsl:number level="any" count="text()[normalize-space()]/>
 </xsl:variable>
 <xsl:variable name="result"
               
select="document('translation-results.xml')/output/from-translator[$text-node-position]"/>
 <xsl:if test="$result/@sentence-boundary='yes">. </xsl:if>
 <xsl:value-of select="$result"/>
</xsl:template>
I'll leave it up to you to determine whether the results would be 
acceptable or not. I think it largely depends on just how much inline 
markup is being used. Perhaps you'd care less about preserving bold, 
italics, and other inline markup and care only about paragraph 
boundaries. That would be much easier, using a similar approach to 
above. In that case, a text blob would be passed to the translator for 
each paragraph rather than every last text node. Either way, we can 
identify each blob of text by position.
Evan
Robert P. J. Day wrote:
  it's been a while since i've written anything in XSLT so i'm going
to try to explain what a colleague is trying to do, assuming *i*
understand it.
  1) start with an involved XHTML document
  2) "extract" just those (english) parts that involve translatable
     text, and hand it to a translator
  3) translator translates english to, say, klingon
  4) rebuild original document with klingon content instead of english
as i understand it, the point of the extraction is that no one wants
to burden the translator with all of the XHTML tagging -- the
translator wants to get the text stripped of all the "clutter", at
which point, after translation, someone needs to be able to put the
document back together.
  is this even a reasonable thing to ask?  in order to reassemble the
document, i'm assuming one is going to have to ID every single bit of
text to have a reference to build backwards.
  thoughts on this?  has anyone done something like this?  or are you
all too busy laughing hysterically by now?
rday
--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--