mail-ng
[Top] [All Lists]

Re: a few short notes

2004-02-03 03:05:57

At 08:16 03/02/2004, Iljitsch van Beijnum wrote:
I wrote a program in C that takes a Netscape bookmark file and stores the content in a database. This is just under 300 lines and it's pretty stupid, it certainly can't handle all variations of HTML.

It could be lack of programming prowess on my part, but I find parsing HTML / XML syntax incredibly inconvenient. The most troubleshome part is that you can't just work left to right, you have to look for close tags and so on.

Also, it just doesn't make any sense.

Why is it <input type=blah> but <title>blah</title> ? Something like input="blah" title="blah" would be much better.

I couldn't agree more. I wrote a basic XML parser which could do just what we needed and no more and it was 259 lines long (that's not counting the STL libraries it used). This built up a tree structure of the XML. There was even more code to grab the particular XML element I wanted from that tree structure. I can't see how you could do it in 12 lines apart from just to find a specific tag value.

OTOH, my RFC822 header email-address-aware parser is only 220 lines long. (If I didn't need to parse email addresses it would be MUCH shorter). A parser for a better designed plain text metadata format could easily be in the region of 50 lines or less.

I don't like RFC822 headers, but I think there are simpler alternatives to XML which I'd prefer. I wouldn't die if it did turn out to be XML, but I'd like a good reason rather than 'it's the new way of doing things'.


Paul                            VPOP3 - Internet Email Server/Gateway
support(_at_)pscs(_dot_)co(_dot_)uk                     http://www.pscs.co.uk/



<Prev in Thread] Current Thread [Next in Thread>