xsl-list
[Top] [All Lists]

Re: [xsl] Re: XML/XHTML fragment to text

2007-08-15 09:59:40
Hi Abel,

you've got 99% right !

A from D, your understanding is perfect, and your wording is much more clear than mine !..
[English is not my mother tongue]

E. You seem to have a preference for Xalan-C (but the above is *much*
easier with Saxon 8.9!!!)

This one is not quite right. Let me discuss it as you suggested.
Personally I would prefer Saxon: XSLT2.0 make things so much easier.
And my advise would be to buy the schema-aware version, because,
for example, it would be stupid to write code to check the input,
where it's so much safer and easier to maintain relying
on schema validation.
For my personal sites, I'm generating the HTML and XHTML versions
from XML source using non-validating Saxon (at home I don't bother
with schema). I first used Xalan, but then I had to write a make.bat and separate stylesheets to calculate the whole site.
So I switched to Saxon and it made things so much better. Typically,
it takes 7 seconds to generate 400 HTML pages on my laptop, but at
home I won't bother it is 7sec or 4,5sec as it was with Xalan,
because I'm not regenerating my sites every other day !

But at work, the only thing that has been authorized for now is
Xalan-C. It is running in batches (jobs) on AIX machines.
The reason why they are not considering another transformation engine,
at the moment, is performance. Even for a small transformation if you
run Saxon or Xalan-J, you will have to set up an run a JVM in your
Unix batch.
Launching the JVM has a cost in memory and time.
And even if you don't count the JVM cost, Saxon is Java code, so it
has to pay the Java overload compared to a code written
in C++... Although Saxon may perform faster on some specific
templates where it has better optimisations, on an "average" template
it will still be slower because it's Java versus C++.
The goal is to be able to run a 5 million base customer, so we have
to count every second in our batch process.

But things might change a little bit now. Because for SEPA (Single
Euro Payments Area) they choosed to write a Java program and run it
in our main batch to transform the XML (ISO 20022 defined) files to
fixed-length format understandable by our legacy Cobol programs.
So they are definitely running a JVM inside main the batch,
and I will point that to the persons in charge of choosing standard
software. The cost of maintaining a Java program for such a
transformation might also be higher than having just a XSL
stylesheet where possible (I'm not on the SEPA project so I didn't
look if it is possible to transform the ISO 20022 XML easily).


For instance, making the records fixed length is as easy
as 1-2-3 (where you would need recursive templates in XSLT 1.0)

I must have read only part of the XSLT2.0 documentation then... because it still looks to me *not* so easy, even with XSLT2.0
Even for a string, you still have to write things like :
substring(concat(myString,$padding),1,$N) to pad it correctly
... which you could already do with 1.0 (with no recursion provided $padding is long enough). I think I saw a padding function in EXSLT, but it doesn't seem to have been made standard in 2.0 Of course XSTL1.0 is even worse. For example, you don't have any function to handle dates, and that's painful.

Or we could probably write (or buy) "generic" patterns to transform to fix-length.

3. When the HTML field passes by, use unparsed-text(...temp file
here...) to include the textized HTML data

I didn't use this function yet, it looks as a very elegant way
to solve the problem, except for one little thing (but I'll check
the documentation).

The last bit of headache is the "UTF-8" problem !
Because fixed-length is fixed-length in *bytes*.
It is fine with ISO8859-1 where char=byte.
But as we run internationally and also have Greece, Russia, etc...
and could run in China, we need UTF-8.

For that, with XSLT1.0, I agree with you, I had to build insane
recursive templates to calculate the length in bytes of an
UTF-8 string. It's so insane, I tend to think at this point
we would rather write a Java or even Cobol program.
With XSLT2.0 it's easier, but still difficult because you
are doing things only the serializer should do... or is there
a function I didn't notice that can return a string length in
bytes and not in chars ?


Not sure if you mean if this is already dropped by your team.

The last memo I read suggested that we could do otherwise,
because mixing fixed-length and variable-text doesn't look a
good architecture to start with.
You came to the same conclusion, your advise been to separate
the variable part (e.g. HTML) in a temporary file, even if your
templates are smarter and to put every piece together again. But as I'm on holidays now, I'll have to check the project
status when I'm back in September !


Thanks again for all your help.






--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--