A bug with dtdparse is that it does not handle whitespace properly
in system identifiers. For example, say the following entity
declaration exists:
<!ENTITY % parm.ent PUBLIC "-//Earl Hood//ENTITIES
A newline in the PubID//EN">
The newline is kept when parsed, but when dtdparse attempts to
resolve the pubid, it cannot since the newline/whitespace is
not compressed when doing the lookup in the catalog.
An immediate fix to the problem is the following patch:
<patch>
--- Catalog.pm.org Sat Feb 15 19:21:53 2003
+++ Catalog.pm Sat Feb 15 19:27:33 2003
@@ -84,6 +84,7 @@ sub system_map {
sub public_map {
my($self, $pubid) = @_;
+ $pubid =~ s/\s+/ /g;
return $self->_find('PUBID', $pubid);
}
@@ -111,6 +112,7 @@ sub reverse_public_map {
sub declaration {
my($self, $pubid) = @_;
+ $pubid =~ s/\s+/ /g;
foreach my $dir (@{$self->{'DIRECTIVE'}}) {
my %hash = %{$dir};
</patch>
Years ago, I wrote a complete(?) SGML Open Catalog parser module.
If there is any interest in it, just let me know.
--ewh