xsl-list
[Top] [All Lists]

RE: [xsl] plea for help...

2006-03-09 15:46:39
Wendell,
I attended.

It was very well done. A great help for beginners as well as good
insights for those with lots of battle scars.

Thanks,
Mike Ferrando
Library Technician
Library of Congress
Washington, DC
202-707-4454

--- Wendell Piez <wapiez(_at_)mulberrytech(_dot_)com> wrote:

Walter,

At Mulberry we recently gave a seminar on the topic of converting 
HTML to XML, so the issues are fresh in my mind.

You're facing a fairly complex set of problems, but they can be 
simplified (as you are discovering) by distinguishing between

A. The syntactic conversion of HTML to XML
B. The "semantic" conversion from HTML display-oriented tagging to
a 
stronger form of tagging in XML.

Other contributors have posted links to tools that help you with
job 
A -- Tidy and its ilk -- and it appears you've got a handle on
that. 
This work can be largely or entirely automated. Of course, what you

get out the other end is still HTML tagging, albeit in XML syntax 
(it'll be either valid XHTML or a similar XML-compliant HTML), so
as 
you're finding it's not good to go for everything you might do with

well-designed XML markup. But to have it XML syntactically is
already 
a big step, because you can then use more and better tools on it to

take it the rest of the way -- including (which is the question
isn't 
totally off topic here) XSLT.

To do conversion B, however, is an entirely different kettle of
fish 
-- and it is beyond the scope of this list, I'm afraid.

As long as I'm already on it, however, I am willing to comment that

the scope and difficulty of conversion B is directly related both
to 
the quality of tagging in your source (HTML can be "clean" or 
"dirty", consistent or messy, even after it's made XML-conformant
in 
its syntax) and, most dramatically, to the nature of your target
tag 
set and to the feasibility of mapping from the HTML you have to
this target.

Sometimes this conversion can be automated; sometimes it can be 
mostly automated; often it requires a good measure of attention
from 
human beings to determine how things should be converted in any
given case.

The design of that target markup, however, is critical; by itself, 
this factor alone can make or break your project. There is an 
infinity of things potentially expressible in XML, which a machine,

even one programmed with very sophisticated heuristics, will not
know 
how to tag correctly, even when it's starting with some kind of
HTML tagging.

Accordingly, generally successful efforts at this kind of
conversion 
include both designing that format up front, and controlling its 
design carefully. Design it to concrete requirements, not just to 
what you think might be useful or fun to have some day, and don't
be 
over-ambitious. You can't convert to a target you can't see. But if

you have a design, the places where conversion is easy or difficult

will fairly quickly come to light and you can figure out how to
deal with them.

I think earlier someone suggested you prototype this first before 
attempting it. That's very good advice.

There are also professionals who will gladly share their experience

in this area, if you are in a position to save money over the long 
term by investing it intelligently in the near term.

Good luck,
Wendell

At 11:52 AM 3/9/2006, you wrote:

On Wed, March 8, 2006 5:28 pm, Florent Georges wrote:
Walter Torres wrote:


1) convert HMTL into well formed HTML (many are not)
2) convert well formed HTML into xHTML


Tidy HTML will give you XHTML from HTML.

Yes, just found it late last night. Been playing with it all
morning.

Getting it to work in PHP5 is waht I'm focusing on now.


3) convert xHTML into XML


An XHTML instance is already an XML instance.

Yes, I understand that.

But I'm trying to get this to a "pure" xml, no display
characteristics
markup what so ever!

The idea here is to have a "raw/naked" file as possible, that way
any
system can display this as they see fit.


If you want to translate the instance from XHTML to an other
XML document
type, XSLT may be of great help.

Sure, that way I can great a look for website A which is different
than
website B, then create a text or RTF only or even email text or
HTML or
even via web-phone.

This is why I was asking about how different folks hand this kind
of
content. What kind of markup it contains, etc.


4) create XSLTs to transpose XML back to HTML for page display

Here again, XSLT may be of great help.

Right.

Thanks

Walter



======================================================================
Wendell Piez                           
mailto:wapiez(_at_)mulberrytech(_dot_)com
Mulberry Technologies, Inc.               
http://www.mulberrytech.com
17 West Jefferson Street                    Direct Phone:
301/315-9635
Suite 207                                          Phone:
301/315-9631
Rockville, MD  20850                                 Fax:
301/315-8285

----------------------------------------------------------------------
   Mulberry Technologies: A Consultancy Specializing in SGML and
XML

======================================================================



--~------------------------------------------------------------------
XSL-List info and archive: 
http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: 
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--




__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--

<Prev in Thread] Current Thread [Next in Thread>