Re: [xsl] Dynamically determining line wraps in HTML table cell output

The DITA Community i18n plugin provides general locale-aware code for doing 
line breaking, word breaking, and rendered size estimation in XSLT using Java 
extensions with Saxon.

The project is here: https://github.com/dita-community/org.dita-community.i18n

While the XSLT has been set up for use in the DITA Open Toolkit, the core bits 
are general and it shouldn't be too hard to adapt to other XSLT contexts. The 
extension functions will work with Saxon HE if you use the Java API to register 
the extension functions per the Saxon documentation (the Open Toolkit starting 
with version 3.3 does this automatically). If you are using licensed Saxon 
versions you can use the Java reflection support to access the extension 
functions.

The code includes a general dictionary-based solution for Simplified Chinese 
sorting and grouping using an open-source Chinese dictionary.

For the purpose of generating Word documents you may also be interested in my 
Wordinator project: https://github.com/drmacro/wordinator

The Wordinator provides a general solution for going from arbitrary XML to DOCX 
by using a general "simple word processing" XML that is then converted to DOCX 
using the Apache POI library. 

Out of the box the Wordinator is optimized for going from HTML to DOCX but it 
can be adapted to any source markup of course. To customize it you implement an 
XSLT transform that generates the simple word processing XML that is then used 
by the Wordinator Java code to generate the DOCX.

For the use case of producing Word tables with formatted text flowed into them 
you could adapt the i18n size estimation code along with the word and line 
breaking to generate the Word table cells.

The i18n code was originally implemented to support the creation of EPUBs where 
each EPUB page was a single HTML page but input was of arbitrary size, so I had 
to implement page layout in XSLT (it's not how I would do it today if I had to 
do it over again but it did result in some useful general facilities for doing 
rough text layout directly in XSLT).

Cheers,

E. 

--
Eliot Kimber
http://contrext.com
 

On 4/22/19, 8:47 PM, "Larry Hayashi lhtrees(_at_)gmail(_dot_)com" 
<xsl-list-service(_at_)lists(_dot_)mulberrytech(_dot_)com> wrote:

    I have a problem that I am not sure how to tackle. I need to transform
    long sentences into multiple HTML tables for inclusion into a
    Microsoft Word document. With short sentences I have no issues, and
    the HTML tables are formatted for inclusion in Word without any
    problems. But with longer sentences, I have to divvy up the sentence
    into fragments. The issue for me is figuring out how to know when to
    divide a longer text sentence into multiple tables so each table fits
    width-wise in the Word document. Are there ways to calculate width
    using XSL other than just string length? The reason I am creating
    separate tables is because each of these will ultimately be
    interlinearized with morphemes and glosses underneath. Refer to
    Leipzig glossing rules
    (https://www.eva.mpg.de/lingua/pdf/Glossing-Rules.pdf). The problem is
    actually much more complex as the glosses in subsequent rows may be
    longer than the words themselves, and the glosses align with the start
    of each word, but I thought I would start with this initial problem
    and see what ideas folks might recommend. I was also wondering if this
    is the kind of thing that XSL-FO might be useful for. I have very
    limited familiarity with XSL-FO.
    
    I suspect that my easiest course of action is to:
    a. pre-determine the left and right margins, indents, etc. for the
    Word document and define a style for the example sentences.
    b. determine the maximum width of a line based on above.
    c. determine the number of m characters (m being the max width
    character possible) at a specified font-size that can fit within that
    width in (b)
    d. use the number in (c) in the XSLT to ensure that sentence fragments
    are always shorter than this number of characters.
    
    The above strategy will work most of the time for roman-based
    orthographies but I suspect will be an issue for other non-Roman
    orthographies.  So, another thought: I suppose one could call an
    external function fDetermineWrappedText(cell_width, font, font-size,
    string) that would populate a table cell and then determine the
    portion that wraps, then return that fragment back to the XSLT. The
    XSLT could then put that returned fragment into its own table. I found
    some suggestions on how to find the line wraps here:
    https://stackoverflow.com/questions/3738490/finding-line-wraps. I have
    minimal experience using external functions in XSLT but I think this
    strategy may be more helpful in the long run.
    
    Simplified source example:
    <document>
    <sentence>John went to the store.</sentence>
    <sentence>Lorem ipsum dolor sit amet, ac et et inceptos eget
    sollicitudin, in urna velit et consectetuer eget cras, dictum erat
    turpis sed velit donec blandit, integer volutpat at dictum nullam
    nunc.</sentence>
    </document>
    
    XSLT process.
    
    Output example:
    <html xmlns="http://www.w3.org/1999/xhtml";>
        <head>
            <title></title>
        </head>
        <body>
            <table>
                <tr><td>John went to the store.</td></tr>
            </table>
    <hr/>
            <table>
                <tr><td>Lorem ipsum dolor sit amet, ac et et inceptos eget
    </td></tr>
            </table>
            <table>
                <tr><td>sollicitudin, in urna velit et consectetuer eget
    cras,</td></tr>
            </table>
            <table>
                <tr><td>dictum erat turpis sed velit donec blandit,
    integer</td></tr>
            </table>
            <table>
                <tr><td>volutpat at dictum nullam nunc.</td></tr>
            </table>
    <hr/>
        </body>
    </html>
    
    Any suggestions in the overall approach to the problem and what you
    would do if using XSLT?
    
    Thanks!
    Larry
    
    
    
--~----------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
EasyUnsubscribe: http://lists.mulberrytech.com/unsub/xsl-list/1167547
or by email: xsl-list-unsub(_at_)lists(_dot_)mulberrytech(_dot_)com
--~--