xsl-list
[Top] [All Lists]

RE: XML Parser bug/error report

2004-04-08 08:45:02
Hi Peiyun,

Two reasons you haven't gotten more responses to your excellent and interesting question:

1. Strictly speaking, it's off topic for this list, not being an XSLT question (or at any rate not framed as such).

2. What you're actually asking about is validation, which is a vast and interesting set of problems.

Almost any solid introductory book to XML technologies will discuss this, sometimes in detail. Not only that, but it's directly linked to the similarly vase problem of XML schema technologies and how to apply them. So I can't really digest the state of the art in this post. But I can suggest:

* You make, and internalize, the distinction between "well-formed" and "valid", if this distinction isn't already clear to you. Again, a decent book, or even some smart searching for these terms, will help. You could have either problem with incoming data. They are not the same kind of problem.

* You identify the nature of the problems you are seeing. Specifically what is meant when you say "Some XML files are coming out of commercial software, but cannnot pass parsing using apache Java parsers"? This is not unheard-of (not all commercial software is equally good), but what's meant by "cannot pass parsing"? Here, the distinction is critical between mere syntactic correctness (well-formedness), and other kinds of correctness (say an XML element is present that you don't want to allow, even though the file is well-formed), which can be lumped together (fans of schema technologies please don't quibble!) as problems with "validity".

So the answer to your question can be quite complex, depending on how you define what a "problem" is. Problems with well-formedness will always be problems. Problems with "validity" (however you define that in your case) can be problems too. Depending on all kinds of issues specific to you, including not only your application domain and architecture but also what technologies you are partial to, you will have a range of ways of dealing with these.

You also ask:

How can you catch the information when error occurs and continue parsing?

It depends: there are a range of different ways of dealing with this depending on what you mean by "error". In particular, you need to understand the difference between validating parsers and non-validating parsers (and parsers that can go either way with a switch).

Is it ever possible?

Yes; and in some applications it's routine and indispensable. That's why there's such a broad range of approaches, as well as ways to "roll your own". (If your XML is well-formed you can even use XSLT for this or XSLT-based validation technologies like Rick Jelliffe's Schematron.)

 How easy is it to implement such a mechanism if possible.

Again, it depends on the capabilities of your tools along with how you define "error".

If you're using Java, you should look at ER Harold's _XML Bible_ (in print), and his site http://www.cafeconleche.org for references and commentary.

That's enough from me on an off-topic post!

Cheers,
Wendell

At 10:41 AM 4/8/2004, you wrote:
The APIs I use are adequate for myself to debug my program.

What happens is that the program is OK, but the XML file is not.

I wan to give a report to the XML file author when the XML file is submitted
(online) and parsed. The authors can have some hints on where the problems
are if any. What're the best practices on this? Some XML files are coming
out of commercial software, but cannnot pass parsing using apache Java
parsers. Is this common?

How can you catch the information when error occurs and continue parsing? Is
it ever possible? How easy is it to implement such a mechanism if possible.


======================================================================
Wendell Piez                            
mailto:wapiez(_at_)mulberrytech(_dot_)com
Mulberry Technologies, Inc.                http://www.mulberrytech.com
17 West Jefferson Street                    Direct Phone: 301/315-9635
Suite 207                                          Phone: 301/315-9631
Rockville, MD  20850                                 Fax: 301/315-8285
----------------------------------------------------------------------
  Mulberry Technologies: A Consultancy Specializing in SGML and XML
======================================================================



<Prev in Thread] Current Thread [Next in Thread>