Re: [xsl] Dynamically determining line wraps in HTML table cell output

Look very promising! Thank you!

On Tue, Apr 23, 2019 at 7:30 AM Eliot Kimber ekimber(_at_)contrext(_dot_)com
<xsl-list-service(_at_)lists(_dot_)mulberrytech(_dot_)com> wrote:


The DITA Community i18n plugin provides general locale-aware code for doing 
line breaking, word breaking, and rendered size estimation in XSLT using Java 
extensions with Saxon.

The project is here: https://github.com/dita-community/org.dita-community.i18n

While the XSLT has been set up for use in the DITA Open Toolkit, the core 
bits are general and it shouldn't be too hard to adapt to other XSLT 
contexts. The extension functions will work with Saxon HE if you use the Java 
API to register the extension functions per the Saxon documentation (the Open 
Toolkit starting with version 3.3 does this automatically). If you are using 
licensed Saxon versions you can use the Java reflection support to access the 
extension functions.

The code includes a general dictionary-based solution for Simplified Chinese 
sorting and grouping using an open-source Chinese dictionary.

For the purpose of generating Word documents you may also be interested in my 
Wordinator project: https://github.com/drmacro/wordinator

The Wordinator provides a general solution for going from arbitrary XML to 
DOCX by using a general "simple word processing" XML that is then converted 
to DOCX using the Apache POI library.

Out of the box the Wordinator is optimized for going from HTML to DOCX but it 
can be adapted to any source markup of course. To customize it you implement 
an XSLT transform that generates the simple word processing XML that is then 
used by the Wordinator Java code to generate the DOCX.

For the use case of producing Word tables with formatted text flowed into 
them you could adapt the i18n size estimation code along with the word and 
line breaking to generate the Word table cells.

The i18n code was originally implemented to support the creation of EPUBs 
where each EPUB page was a single HTML page but input was of arbitrary size, 
so I had to implement page layout in XSLT (it's not how I would do it today 
if I had to do it over again but it did result in some useful general 
facilities for doing rough text layout directly in XSLT).

Cheers,

E.

--
Eliot Kimber
http://contrext.com


On 4/22/19, 8:47 PM, "Larry Hayashi lhtrees(_at_)gmail(_dot_)com" 
<xsl-list-service(_at_)lists(_dot_)mulberrytech(_dot_)com> wrote:

    I have a problem that I am not sure how to tackle. I need to transform
    long sentences into multiple HTML tables for inclusion into a
    Microsoft Word document. With short sentences I have no issues, and
    the HTML tables are formatted for inclusion in Word without any
    problems. But with longer sentences, I have to divvy up the sentence
    into fragments. The issue for me is figuring out how to know when to
    divide a longer text sentence into multiple tables so each table fits
    width-wise in the Word document. Are there ways to calculate width
    using XSL other than just string length? The reason I am creating
    separate tables is because each of these will ultimately be
    interlinearized with morphemes and glosses underneath. Refer to
    Leipzig glossing rules
    (https://www.eva.mpg.de/lingua/pdf/Glossing-Rules.pdf). The problem is
    actually much more complex as the glosses in subsequent rows may be
    longer than the words themselves, and the glosses align with the start
    of each word, but I thought I would start with this initial problem
    and see what ideas folks might recommend. I was also wondering if this
    is the kind of thing that XSL-FO might be useful for. I have very
    limited familiarity with XSL-FO.

    I suspect that my easiest course of action is to:
    a. pre-determine the left and right margins, indents, etc. for the
    Word document and define a style for the example sentences.
    b. determine the maximum width of a line based on above.
    c. determine the number of m characters (m being the max width
    character possible) at a specified font-size that can fit within that
    width in (b)
    d. use the number in (c) in the XSLT to ensure that sentence fragments
    are always shorter than this number of characters.

    The above strategy will work most of the time for roman-based
    orthographies but I suspect will be an issue for other non-Roman
    orthographies.  So, another thought: I suppose one could call an
    external function fDetermineWrappedText(cell_width, font, font-size,
    string) that would populate a table cell and then determine the
    portion that wraps, then return that fragment back to the XSLT. The
    XSLT could then put that returned fragment into its own table. I found
    some suggestions on how to find the line wraps here:
    https://stackoverflow.com/questions/3738490/finding-line-wraps. I have
    minimal experience using external functions in XSLT but I think this
    strategy may be more helpful in the long run.

    Simplified source example:
    <document>
    <sentence>John went to the store.</sentence>
    <sentence>Lorem ipsum dolor sit amet, ac et et inceptos eget
    sollicitudin, in urna velit et consectetuer eget cras, dictum erat
    turpis sed velit donec blandit, integer volutpat at dictum nullam
    nunc.</sentence>
    </document>

    XSLT process.

    Output example:
    <html xmlns="http://www.w3.org/1999/xhtml";>
        <head>
            <title></title>
        </head>
        <body>
            <table>
                <tr><td>John went to the store.</td></tr>
            </table>
    <hr/>
            <table>
                <tr><td>Lorem ipsum dolor sit amet, ac et et inceptos eget
    </td></tr>
            </table>
            <table>
                <tr><td>sollicitudin, in urna velit et consectetuer eget
    cras,</td></tr>
            </table>
            <table>
                <tr><td>dictum erat turpis sed velit donec blandit,
    integer</td></tr>
            </table>
            <table>
                <tr><td>volutpat at dictum nullam nunc.</td></tr>
            </table>
    <hr/>
        </body>
    </html>

    Any suggestions in the overall approach to the problem and what you
    would do if using XSLT?

    Thanks!
    Larry

--~----------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
EasyUnsubscribe: http://lists.mulberrytech.com/unsub/xsl-list/1167547
or by email: xsl-list-unsub(_at_)lists(_dot_)mulberrytech(_dot_)com
--~--