Hi Peiyun,
Two reasons you haven't gotten more responses to your excellent and
interesting question:
1. Strictly speaking, it's off topic for this list, not being an XSLT
question (or at any rate not framed as such).
2. What you're actually asking about is validation, which is a vast and
interesting set of problems.
Almost any solid introductory book to XML technologies will discuss this,
sometimes in detail. Not only that, but it's directly linked to the
similarly vase problem of XML schema technologies and how to apply them. So
I can't really digest the state of the art in this post. But I can suggest:
* You make, and internalize, the distinction between "well-formed" and
"valid", if this distinction isn't already clear to you. Again, a decent
book, or even some smart searching for these terms, will help. You could
have either problem with incoming data. They are not the same kind of problem.
* You identify the nature of the problems you are seeing. Specifically what
is meant when you say "Some XML files are coming out of commercial
software, but cannnot pass parsing using apache Java parsers"? This is not
unheard-of (not all commercial software is equally good), but what's meant
by "cannot pass parsing"? Here, the distinction is critical between mere
syntactic correctness (well-formedness), and other kinds of correctness
(say an XML element is present that you don't want to allow, even though
the file is well-formed), which can be lumped together (fans of schema
technologies please don't quibble!) as problems with "validity".
So the answer to your question can be quite complex, depending on how you
define what a "problem" is. Problems with well-formedness will always be
problems. Problems with "validity" (however you define that in your case)
can be problems too. Depending on all kinds of issues specific to you,
including not only your application domain and architecture but also what
technologies you are partial to, you will have a range of ways of dealing
with these.
You also ask:
How can you catch the information when error occurs and continue parsing?
It depends: there are a range of different ways of dealing with this
depending on what you mean by "error". In particular, you need to
understand the difference between validating parsers and non-validating
parsers (and parsers that can go either way with a switch).
Is it ever possible?
Yes; and in some applications it's routine and indispensable. That's why
there's such a broad range of approaches, as well as ways to "roll your
own". (If your XML is well-formed you can even use XSLT for this or
XSLT-based validation technologies like Rick Jelliffe's Schematron.)
How easy is it to implement such a mechanism if possible.
Again, it depends on the capabilities of your tools along with how you
define "error".
If you're using Java, you should look at ER Harold's _XML Bible_ (in
print), and his site http://www.cafeconleche.org for references and commentary.
That's enough from me on an off-topic post!
Cheers,
Wendell
At 10:41 AM 4/8/2004, you wrote:
The APIs I use are adequate for myself to debug my program.
What happens is that the program is OK, but the XML file is not.
I wan to give a report to the XML file author when the XML file is submitted
(online) and parsed. The authors can have some hints on where the problems
are if any. What're the best practices on this? Some XML files are coming
out of commercial software, but cannnot pass parsing using apache Java
parsers. Is this common?
How can you catch the information when error occurs and continue parsing? Is
it ever possible? How easy is it to implement such a mechanism if possible.
======================================================================
Wendell Piez
mailto:wapiez(_at_)mulberrytech(_dot_)com
Mulberry Technologies, Inc. http://www.mulberrytech.com
17 West Jefferson Street Direct Phone: 301/315-9635
Suite 207 Phone: 301/315-9631
Rockville, MD 20850 Fax: 301/315-8285
----------------------------------------------------------------------
Mulberry Technologies: A Consultancy Specializing in SGML and XML
======================================================================