Re: OPES Rules Language


At 19:33 09/06/2005, Alex Rousskov wrote:

On Thu, 2005/06/09 (MDT), <info(_at_)utel(_dot_)net> wrote:

At 18:55 09/06/2005, Alex Rousskov wrote:

On Thu, 2005/06/09 (MDT), <info(_at_)utel(_dot_)net> wrote:

Some difficulty to understand how you trigger the adaptation then?


The proxy/processor configuration will have a mechanism to specify
what rules/code to apply and where to apply them. Different processors
will have diffierent invocation points and different specification
mechanisms (e.g.,access control lists or hard-coded triggers).


This does not make the language universal.


No language can be universal. Think about it. Somewhere the language
scope ends and the language environment scope begins. For example,
C++ standard does not specify when C++ programs are executed while
shell scripts do not care what language the programs they run were
written in.

The reason to remove invocation points from the rules language
is simple: all existing proxy implementations already have their
configuration language that determines invocation points. Usually,
it comes in a form of an ACL of some sort. Apache, Cisco, NetApp,
sendmail, etc. all have that. Trying to change or replace that
language is fruitless, IMHO.

On the other hand, providing a universal language to describe what
(if anything) happens at the selected invocation point does have a
[slim] chance of being deployed because no popular implementations
(related to HTTP and ICAP) have any good knobs for that. SMTP world
has Sieve and Milter.


Let me think over that and see the implementation.

Objection to have both?
Obviously I can add them but that would be bad if the language ends to
be in an ISO standard what would be great.


Sorry, I do not understand this part. Perhaps you can give a specific
ISO requirement you are trying to satisfy?

We (Internet and the world at large) have a problem with languageidentification. At IETF, W3C and ISO there are roughly three approacheswhich are fiercely disputed because the people supporting the first twohave not (yet) a network vision and there is a lof of money in the first one.

- one is from publishers point of view (books and programs). They startfrom characters (Unicode), to computer (locales), pages (XML, HTML), to aIANA language registration stewardship giving the registrant a commercialadvantage. That approach uses concepts ("English", "Latin Script", ccTLDfor country/market, etc.). This is OK to classify items in a cupboard or ina directory. This is operational. It should extend from 400 to 7500languages. This should be ISO 639-3 and okayed in August.

- another is ISO standard consistent. ISO 12620 defines the data elements,ISO 11179 defines the registries. We are talking of precise rules. A scriptis defined by its charset, a language tested by statistics on recurrentwords, etc. We are no more working on concepts but on values which can beused by computers (and OPES to test the language of an unknownmessage/page, and massage it - translating, entering notes, classifying it,etc.). The base used in that project includes 20.000 languages. This shouldbe ISO 639-6 and big work ahead.

Both use ASCII string IDs for languages (2 or 3 for the first approach, 4for the second one). None of them can support multilingualism. The firstone is deliberately ASCII and English oriented (the table) and wants toreplace RFC 3066. The second is fully open to multilingualism but work onmultilingualism has not been carried yet in ISO 11179. It is supported byGovs, R&D, etc.

- the last one (my team) starts from the smart-user-in-a-network's point ofview. We cannot accept the first one only. The second one is OK but withsome conceptual addition and a lot of wording simplification - this iscomplex if we want to stay fully compatible with metamodels we could haveto hook. We will give all the language IDs a number (an additional column)and we index twice. So we can come from the ID number, or from a string. Wemake the numeric language ID an IPv6 Interface ID and the string languageID a domain name. Access is therefore very fast and proven. When we call alanguage (by IPv6 or by name) we call the registry of that language. Theresponse we get give us all the registered details of the language (it canbe an XML page, an ASN.1 structure, or we can add the sub-address of oneelement to get only one word). Addresses can retain the information or tobe recursive and point to another address if the information is common orin case of update. This concept should be documented hopefully in ISO 639-4this year or after having been demonstrated next year.

My idea is to say the Sieve/P language is a language to parse a text. Letstabilize its commands as a metamodel (XML fields) conforming to the ISO11179 standard (it is OK no problem - but careful wording). This means thatwe will have documented a universal way for a user in every language toenter a script to work with his texts. This kind of thing is developed inlexicons already so the concepts and tools are there. It may take time tofill all the entries (actually we need 100 entries for each script and thento refine - wikipedia could help a lot): this would be the firstmultilingual computer language.

This means that any user, on any keyboard, could enter an save a script tohave an OPES working on his language, this being documented in ISO standards.

Now you realise that the OPES will use a CRC (context reference center, thesystem where we will locate the registry) as a call-out server. This CRCcan belong to a mailing list for example. It can document a vernacularversion of the language being used (dictionary, ontology, syntax, grammar,etc.). For example each time we would copy someone outside of the OPESmailing list and we use "OPES", the system could add in the language of thee-mail a foot note explaining the destinee what an OPES is and possiblycorrect my Franglish in readable English.


Was I clear enough?

For that to work, I think we need to have a simple enough language neutralto ASCII strings (commands and processed text). This is also why at thisstage I would be happy to keep triggers. There are enough tricky thingswhich may happen in language logic (bidi for example) to feel moreconfortable if you have more control.

Obviously if it does not work I will drop the request ... but it would begreat if it did: for the service provided, for the fun, for the first attempt.

jfc