ht://Dig Copyright © 1995-1999 The ht://Dig Group
Please see the file COPYING for
license information.
|
<SELECT NAME="search_algorithm"> <OPTION VALUE="exact:1 prefix:0.6 synonyms:0.5 endings:0.1" SELECTED>fuzzy <OPTION VALUE="exact:1">exact </SELECT> |
| allow_in_form: search_algorithm search_results_header |
| bad_querystr: forum=private section=topsecret&passwd=required |
| bad_word_list: ${common_dir}/badwords.txt |
| common_url_parts: |
http://www.htdig.org/ml/ \ .html \ http://dev.htdig.org/ \ http://www.htdig.org/ |
The default value of this attribute is determined at compile time.
| doc_db: ${database_base}documents.db |
| endings_affix_file: /var/htdig/affix_rules |
| endings_dictionary: /var/htdig/dictionary |
| endings_root2word_db: /var/htdig/r2w.db |
| endings_word2root_db: /var/htdig/w2r.bm |
The parser program takes four command-line
parameters, not counting parameters and parameters
given in the command string:
infile content-type URL configuration-file
| Parameter | Description | Example |
|---|---|---|
| infile | A temporary file with the contents to be parsed. | /var/tmp/htdext.14242 |
| content-type | The MIME-type of the contents. | text/html |
| URL | The URL of the contents. | http://www.htdig.org/attrs.html |
| configuration-file | The configuration-file in effect. | /etc/htdig/htdig.conf |
The external parser is to write information for
htdig on its standard output.
The output consists of records, each record terminated
with a newline. Each record is a series of (unless
expressively allowed to be empty) non-empty tab-separated
fields. The first field is a single character
that specifies the record type. The rest of the fields
are determined by the record type.
| Record type | Fields | Description |
|---|---|---|
| w | word | A word that was found in the document. |
| location | A number indicating the normalized location of the word within the document. The number has to fall in the range 0-1000 where 0 means the top of the document. | |
| heading level |
A heading level that is used to compute the
weight of the word depending on its context in
the document itself. The level is in the range of
0-10 and are defined as follows:
|
|
| u | document URL | A hyperlink to another document that is referenced by the current document. It must be complete and non-relative, using the URL parameter to resolve any relative references found in the document. |
| hyperlink description | For HTML documents, this would be the text between the <a href...> and </a> tags. | |
| t | title | The title of the document |
| h | head | The top of the document itself. This is used to build the excerpt. This should only contain normal ASCII text |
| a | anchor | The label that identifies an anchor that can be used as a target in an URL. This really only makes sense for HTML documents. |
| i | image URL | An URL that points at an image that is part of the document. |
| m | http-equiv | The HTTP-EQUIV attribute of a META tag. May be empty. |
| name | The NAME attribute of this META tag. May be empty. | |
| contents | The CONTENTS attribute of this META tag. May be empty. |
| external_parsers: |
text/html /usr/local/bin/htmlparser \ application/ms-word "/usr/local/bin/mswordparser -w" |
| htnotify_sender: bigboss@yourcompany.com |
| http_proxy: http://proxy.bigbucks.com:3128 |
| http_proxy_exclude: http://intranet.foo.com/ |
| keywords_meta_tag_names: keywords description |
| limit_normalized: http://www.mydomain.com |
| local_urls: http://www.foo.com/=/usr/www/htdocs/ |
| local_user_urls: http://www.my.org/=/home/,/www/ |
| metaphone_db: ${database_base}.mp.db |
| next_page_text: <img src="/htdig/buttonr.gif"> |
| no_page_list_header: <hr noshade size=2>All results on this page.<br> |
| no_page_number_text: |
<strong>1</strong> <strong>2</strong> \ <strong>3</strong> <strong>4</strong> \ <strong>5</strong> <strong>6</strong> \ <strong>7</strong> <strong>8</strong> \ <strong>9</strong> <strong>10</strong> |
| nothing_found_file: /www/searching/nothing.html |
| page_number_text: |
<em>1</em> <em>2</em> \ <em>3</em> <em>4</em> \ <em>5</em> <em>6</em> \ <em>7</em> <em>8</em> \ <em>9</em> <em>10</em> |
| prev_page_text: <img src="/htdig/buttonl.gif"> |
|
remove_default_doc: default.html default.htm index.html index.htm
or remove_default_doc: |
| search_algorithm: exact:1 soundex:0.3 |
| search_results_footer: /usr/local/etc/ht/end-stuff.html |
| search_results_header: /usr/local/etc/ht/start-stuff.html |
| search_results_wrapper: ${common_dir}/wrapper.html |
| server_aliases: |
foo.mydomain.com:80=www.mydomain.com:80 \ bar.mydomain.com:80=www.mydomain.com:80 |
|
|
| sort_names: |
score 'Best Match' time Newest title A-Z \ revscore 'Worst Match' revtime Oldest revtitle Z-A |
| star_blank: http://www.somewhere.org/icons/elephant.gif |
| star_image: http://www.somewhere.org/icons/elephant.gif |
| star_patterns: |
http://www.sdsu.edu /sdsu.gif \ http://www.ucsd.edu /ucsd.gif |
| start_url: http://www.somewhere.org/alldata/index.html |
| synonym_dictionary: /usr/dict/synonyms |
| syntax_error_file: ${common_dir}/synerror.html |
| template_map: |
Short short ${common_dir}/short.html \ Normal normal builtin-long \ Detailed detail ${common_dir}/detail.html |
| url_part_aliases: |
http://www.htdig.org/ *1 \ http://www.htdig.org/ml/ *2 \ http://dev.htdig.org/ *3 .html *4 |
| word_list: ${database_base}.allwords.text |