ht://Dig Copyright © 1995-1999 The ht://Dig Group
Please see the file COPYING for
license information.
The system performs three major tasks that should be performed in the following order:
Digging is the first step in creating a search database. This
system uses the word digging while other systems call
it harvesting or gathering. In the ht://Dig
system, the program htdig performs
the information gathering stage. In this process, the program
will act as a regular web user, except that it will follow
all hyperlinks that it comes across. (Actually, it
will not follow all of them, just those that are within the
domain it needs to gather information on...)
Each document it goes to is examined and all the unique
words in this document are extracted and stored.
The digging process will create at least two files. The first one is the list of all the words and the second one is a database of URLs and information about the URLs.
Once the digging process has completed, it needs to be converted into something the search engine can actually use. The htmerge program will use the information from previous digs to create a database that the search engine can use. It uses the term 'merge' because it will take data from several databases and merges them into several other databases. The source databases include the databases created by the Digging process but also a previous merged databases. These old databases are used if the Digging process produced information only for documents which have changed.
There are several optional tasks which also fit into the merge phase:
Searching is where the users actually get to use all the information that was gathered during the dig and merge stages. The htsearch program performs the actual searches. It produces HTML output which will be seen by the users.