ietf
[Top] [All Lists]

RE: Building a new work group for public information retrieval protocol, ask for advices.

2004-01-09 13:13:17
There used to exist a DASL WG and its main proposal included a framework
for sending Search requests to Web servers (including WebDAV servers).
The framework was complemented by one proposed syntax for searching
the resources stored directly on that server according to their
metadata values.  Another syntax for the same framework could 
easily allow some Web servers to act as search aggregators or
proxies for other repositories -- including not just a larger
group of Web servers but also non-HTTP URLs could be returned. 

Now the DASL WG doesn't exist, but the work still continues on 
the WebDAV WG mailing list.

http://www.webdav.org/dasl/

Lisa

-----Original Message-----
From: owner-ietf(_at_)ietf(_dot_)org [mailto:owner-ietf(_at_)ietf(_dot_)org] 
On 
Behalf Of wang liang
Sent: Wednesday, January 07, 2004 11:21 PM
To: ietf(_at_)ietf(_dot_)org
Subject: Building a new work group for public information 
retrieval protocol, ask for advices.


After publishing the message "propose some information 
retrieval protocols
for Internet", we received many advices. Now we
want to build a new work group for this issue, asking for 
more advices.
Information retrieval service may exceed E-mail
service and become the most import service of Internet, so we 
can't neglect
it.

The reason to build a work group for public information 
retrieval protocols
lies in the disadvantage of current commercial
search engines and the improvement in future public search system.

The faults in commercial search engines.

1 In technology. Now no search engine can cover 60% of all 
the pages on
Internet. The average update interval of their web
pages database is almost one month. This is mainly because no 
of them can
close keep up with the explosive web pages on
Internet. But the web page is only one kind of information 
resources. There
are still many other resources such as video,
special databases, BBS, etc. Could you image single search 
engine company
can efficiently administer all these information
resources?

2 In business model. Now many search engine companies are 
concerned with how
to make profit from company users by
advertisement and ranking prominence, but never consider what its real
customers will feel. Search engine originally is tools
for the convenience of Internet customers, but search engine 
companies have
to apply advertisement or selling ranking
prominence, somewhat inconvenient to information retrieval, 
to maintain
their subsistence. In other words, search engines
make money at the cost of inconvenience of most Internet 
users, but not its
high quality of search service.

3 Except search engine, all the services of Internet such as 
E-mail, BBS,
and FTP are all based on public protocol. There is
no secret technology in these services. But the information retrieval
service, may be the most important service on Internet,
is still dominated by few search engine companies. Many 
experts know the
basic "Pages Ranking" algorithm, but no one know its
detail, which is commercial secret. No public surveillance, 
no real candid
ranking algorithm. but We all know another world
famous algorithm very well, "money can elevate ranking 
score". This may not
comply with the basic rules of Internet, a public
and free world.

4 In any free market, customers should be the God forever, but not few
companies.


The improvement in new public search system, DRIS (Domain resources
integrated system)

1 In technology. DRIS will build the information retrieval 
infrastructure of
Internet. DRIS applies a hierarchical
distributed architecture to manage all the information on 
Internet, just
like DNS. Its main principle is (organization level
-conventional database system)-(main sub country Internet 
level-metadata
harvest system)-(country level-distributed search
system).In easy words, like web pages system, every DRIS 
server in bottom
level like a university will download and index all
the web pages in its local network and then send the metadata 
to higher
layer. All the other resources are also integrated in
this method. So DRIS will improve the performance of Internet 
search engine
in recency, coverage and so on.

2 Management. Who will control the DRIS? It's administrated 
by none of us
but every of us. DRIS is managed by its users and
coordinated by a public organization, just like management 
method of DNS.
Every organization is its customer and also its
builder. It's just the real truth of Internet. DRIS is an 
opening system,
which needn't any profits from its users and of
course need not any advertisements.

3 The basic idea of DRIS : "search should be the internal function of
Internet and every one should have his own search
engine". DRIS just provide the rude search results (like the 
results in
current search engine). Many intelligent search
systems can apply DRIS as their data source and provide high 
quality of
personal or commercial search service. So commercial
search engine can still survive in the way it should be.

4 Although DRIS gives us an excellent and promising solution 
for the new
public Internet search system, this can't ensure the
establishment of DRIS. One important principle in technology, the best
technology is the technology that can meet the urgent
demand in society. This is just the secret of DRIS. In our testbed, in
organization level, only few universities have the web
search engine for the school network. Say nothing of union 
search system
that can efficiently integrate all its information
resources such as ftp, BBS and special databases in library. 
It's the demand
in third layer. Sharing the information
resources between different organizations is also an 
attraction, which is
the demand to build the second layer's DRIS. In the
top layer, integrating all the information resources on 
Internet may be the
dream of everyone.

5 Practice is the only principle to judge a theory. Now we 
have built some
experimental third layer's DRIS servers in HuBei
Province. I can only say that things just should like this.



                                                    Protocol Series of
DRIS(for work group)
Description of Working Group:

With the rapid increase of the web pages, the coverage of 
search engines
will become poorer and the update interval will be much longer. If the
current architecture of search engines is still in use, it will be an
impossible mission to find the precise and comprehensive 
information in the
future. This problem will be more serious when IPV6 
technology is widely
implemented in communication networks. The problem of "Too 
much information
means no information" may become a disaster with information 
explosion. To
solve this problem, there should be an efficient information 
management
system for Internet.

In this group, Domain Resource Integrated System--DRIS will 
be proposed.
DRIS is a distributed information retrieval system, which 
will build the
information retrieval infrastructure for the Internet and also can be
regarded as a kind of Internet information management system.

DRIS is a hierarchical distributed search system and comprise 
three kinds of
information retrieval system, conventional
database system, distributed search system and metadata 
harvest system. We
will first define the basic search system and then
define the entire DRIS.

Specific work items are:

1 Standard distributed search system. It defines the 
platform-independent
search interface and a collection description
standard for heterogeneous information resources. An I-D "information
retrieval protocol for digital resources" has been
proposed.  
(http://www.ietf.org/internet-drafts/draft-liang-irpdl-03.txt)

2 Standard metadata harvest system. A protocol based some 
available opening
standard like OAI will be proposed. It will
define a standard metadata that can be compatible with most 
database system.

3 Standard public web pages search system. There are many 
kinds of database
system. As long as they can provide the standard
distributed search interface or comply with the metadata 
harvest format,
they can be brought into DRIS in appropriate layer.
But web pages are special for its distributed character and 
astronomical
amount. To efficiently integrate web pages on
Internet, DRIS will build a public opening web pages 
database, which will
strictly comply with the principle of (organization
level-conventional database system)-(sub country Internet 
level-metadata
harvest system)-(country level-distributed system).
(More information: Make search become the internal function 
of Internet.
http://arxiv.org/abs/cs.IR/0311015)

4 DRIS. It will define entire DRIS. It includes its whole 
architecture, the
relation between different nodes, etc.
(more information: Evolution:Google vs.GRIS.
http://arxiv.org/abs/cs.DL/0312024)

5 DRIS and IPV6. The cooperation with IPV6 WG will be 
proposed. IPV6 will be
the most distinct feather of next generation
Internet.IPV6 is still in improving and any technology that 
can benefit the
Internet all can be added to the IPV6 system.
Since the searching is the main service of most user of 
Internet and this
service is not so satisfied to us in current
Internet, why not take this request into account when build the new
Internet. For example, in IPV6, all kinds of data flows
are assigned a priority, and then Internet can guarantee a 
high priority to
the data flow of DRIS. So there may need some
considerations for the relation between DRIS and IPV6.

The detailed information about DRIS could be found in
http://www.lib.hust.edu.cn/dl-lib/English/main.htm

Ask for more advices. Thanks