ietf-openproxy
[Top] [All Lists]

Re: OPES Rules Language

2005-06-09 13:24:29

At 19:33 09/06/2005, Alex Rousskov wrote:
On Thu, 2005/06/09 (MDT), <info(_at_)utel(_dot_)net> wrote:
At 18:55 09/06/2005, Alex Rousskov wrote:
On Thu, 2005/06/09 (MDT), <info(_at_)utel(_dot_)net> wrote:

Some difficulty to understand how you trigger the adaptation then?

The proxy/processor configuration will have a mechanism to specify
what rules/code to apply and where to apply them. Different processors
will have diffierent invocation points and different specification
mechanisms (e.g.,access control lists or hard-coded triggers).

This does not make the language universal.

No language can be universal. Think about it. Somewhere the language
scope ends and the language environment scope begins. For example,
C++ standard does not specify when C++ programs are executed while
shell scripts do not care what language the programs they run were
written in.

The reason to remove invocation points from the rules language
is simple: all existing proxy implementations already have their
configuration language that determines invocation points. Usually,
it comes in a form of an ACL of some sort. Apache, Cisco, NetApp,
sendmail, etc. all have that. Trying to change or replace that
language is fruitless, IMHO.

On the other hand, providing a universal language to describe what
(if anything) happens at the selected invocation point does have a
[slim] chance of being deployed because no popular implementations
(related to HTTP and ICAP) have any good knobs for that. SMTP world
has Sieve and Milter.

Let me think over that and see the implementation.

Objection to have both?
Obviously I can add them but that would be bad if the language ends to
be in an ISO standard what would be great.

Sorry, I do not understand this part. Perhaps you can give a specific
ISO requirement you are trying to satisfy?

We (Internet and the world at large) have a problem with language identification. At IETF, W3C and ISO there are roughly three approaches which are fiercely disputed because the people supporting the first two have not (yet) a network vision and there is a lof of money in the first one.

- one is from publishers point of view (books and programs). They start from characters (Unicode), to computer (locales), pages (XML, HTML), to a IANA language registration stewardship giving the registrant a commercial advantage. That approach uses concepts ("English", "Latin Script", ccTLD for country/market, etc.). This is OK to classify items in a cupboard or in a directory. This is operational. It should extend from 400 to 7500 languages. This should be ISO 639-3 and okayed in August.

- another is ISO standard consistent. ISO 12620 defines the data elements, ISO 11179 defines the registries. We are talking of precise rules. A script is defined by its charset, a language tested by statistics on recurrent words, etc. We are no more working on concepts but on values which can be used by computers (and OPES to test the language of an unknown message/page, and massage it - translating, entering notes, classifying it, etc.). The base used in that project includes 20.000 languages. This should be ISO 639-6 and big work ahead.

Both use ASCII string IDs for languages (2 or 3 for the first approach, 4 for the second one). None of them can support multilingualism. The first one is deliberately ASCII and English oriented (the table) and wants to replace RFC 3066. The second is fully open to multilingualism but work on multilingualism has not been carried yet in ISO 11179. It is supported by Govs, R&D, etc.

- the last one (my team) starts from the smart-user-in-a-network's point of view. We cannot accept the first one only. The second one is OK but with some conceptual addition and a lot of wording simplification - this is complex if we want to stay fully compatible with metamodels we could have to hook. We will give all the language IDs a number (an additional column) and we index twice. So we can come from the ID number, or from a string. We make the numeric language ID an IPv6 Interface ID and the string language ID a domain name. Access is therefore very fast and proven. When we call a language (by IPv6 or by name) we call the registry of that language. The response we get give us all the registered details of the language (it can be an XML page, an ASN.1 structure, or we can add the sub-address of one element to get only one word). Addresses can retain the information or to be recursive and point to another address if the information is common or in case of update. This concept should be documented hopefully in ISO 639-4 this year or after having been demonstrated next year.

My idea is to say the Sieve/P language is a language to parse a text. Let stabilize its commands as a metamodel (XML fields) conforming to the ISO 11179 standard (it is OK no problem - but careful wording). This means that we will have documented a universal way for a user in every language to enter a script to work with his texts. This kind of thing is developed in lexicons already so the concepts and tools are there. It may take time to fill all the entries (actually we need 100 entries for each script and then to refine - wikipedia could help a lot): this would be the first multilingual computer language.

This means that any user, on any keyboard, could enter an save a script to have an OPES working on his language, this being documented in ISO standards.

Now you realise that the OPES will use a CRC (context reference center, the system where we will locate the registry) as a call-out server. This CRC can belong to a mailing list for example. It can document a vernacular version of the language being used (dictionary, ontology, syntax, grammar, etc.). For example each time we would copy someone outside of the OPES mailing list and we use "OPES", the system could add in the language of the e-mail a foot note explaining the destinee what an OPES is and possibly correct my Franglish in readable English.

Was I clear enough?

For that to work, I think we need to have a simple enough language neutral to ASCII strings (commands and processed text). This is also why at this stage I would be happy to keep triggers. There are enough tricky things which may happen in language logic (bidi for example) to feel more confortable if you have more control.

Obviously if it does not work I will drop the request ... but it would be great if it did: for the service provided, for the fun, for the first attempt.
jfc


<Prev in Thread] Current Thread [Next in Thread>