[Inso Home Page] [Home] [Collection] [Book] [Expand] [Collapse] [Search Forms] [Previous Section with Hits] [Next Section with Hits] [Clear Search] [Preferences] [Print] [Help]

 inside  Expand Search


   The DynaWeb Thesaurus   [Table of Contents]    Exporting XML in DynaWeb

DynaWeb Publishers Guide

[-] 3. Overview of DynaWeb Configuration Files
[-] Creating a Publisher's Thesaurus

Creating a Publisher's Thesaurus

Publishers can create a thesaurus that contains specific linking for words used in their documentation. Inso provides a command line program called makept to compile the Publisher Thesaurus into a format parseable by DynaWeb and DynaText. This section will detail the steps required to create a publisher's thesaurus and use it with your DynaWeb installation.

File Format of the Publisher's Thesaurus

The Publisher's Thesaurus contains all of the potential query terms (called head terms) and all of the words that DynaWeb will look for when executing an expanded search. The input file for a Publisher's Thesaurus is a tag-delimited ASCII file. It uses the single-byte Windows character set.

Syntax

The Publisher's Thesaurus contains a series of entries that denote all possible head terms and their related words using a tag-based structure. A sample thesaurus is shown below:

Sample Thesaurus

<HEADER>
   <LANG>ENGLISH
</HEADER>
<TOPIC>
   <HT>boat</HT>
    <TERM>ship<TYPE>RT<SOA>95<POS>N</TERM>
    <TERM>sailboat<TYPE>RT<SOA>80<POS>N</TERM>
</TOPIC>

The thesaurus is grouped into topics. Each topic contains the head term, followed by the related terms for that word. The table below explains each of the tags used in a Publisher's Thesaurus.

Tag Name

Explanation

Range of Values

Contained Within:

Contains:

<HEADER>, </HEADER>

Contains information about the current thesaurus

None.

None.

<LANG>

<LANG>

The language of the thesaurus.

Uppercase English spellings of languages. Ex: "ENGLISH", "FRENCH" (quotes not necessary)

<HEADER>

None.

<TOPIC>, </TOPIC>

Surrounds the definition of an entire topic, including all the terms that are defined for the topic.

None.

None.

<HT>
<TERM>

<HT>,</HT>

Delimits the head term for a topic. Only one head term may be defined per topic.

None.

<TOPIC>

None.

<TERM>,</TERM>

Delimits an expansion term for the current <TOPIC>

None.

<TOPIC>

<TYPE>
<SOA>
<POS>

<TYPE>

Declares the type of relationship between the current term and the head term.

BT Broader Term
NT Narrower Term
RTRelated Term
SP Spelling Variant
SY Synonym

<TERM>

None.

<SOA>

Defines the strength of association between the term and the head term.

1-99

<TERM>

None.

<POS> (optional)

Defines the Part Of Speech for a term.

N Noun
J Adjective
V Verb
R Adverb
O Other - None - N/A

<TERM>

None.

Using the above format, create a file that contains all of the proprietary and specialized terms you want people to be able to use in expanded searches.

Compiling the Publisher's Thesaurus

There is a tool included with your DynaWeb installation called makept. Makept takes the ASCII file containing your thesaurus entries and compiles it into a database format. The syntax for makept is:

makept -I [/path/to/input_file] [/path/to/output_file]

So, to illustrate, if you have a thesaurus file you've created named thes.dat, and you want to compile it into a file called newpt.dat, this is what you would enter:

makept -I thes.dat newpt.dat

Make sure you name the output file a different name than the input file, or you will overwrite the input file.

Note: On UNIX, make sure your LD_LIBRARY_PATH is set to the absolute path to the lib directory before running makept. The lib directory is located under the [platform] directory of your DynaWeb installation.

Referencing the Publisher's Thesaurus

In order to add your new publisher's thesaurus to the thesauri that DynaWeb checks during an expanded query, you need to modify the qeconfig.dat file. This file is located in the data/qe directory under your DynaWeb installation. The format of the qeconfig.dat file looks like this:

<config>
   <dbset>
      <db type="ise">en/ise.dat</>
      <db type="isecache">en/cache.dat</>
      <db type="isenoise">en/noise.dat</>
      <db type="ciet">en/thes.dat</>
   </dbset>
</config>

If you have multiple language thesauri installed with DynaWeb, there will be additional <dbset> statements -- one for each language.

You will need to add a line to the <dbset> statement for the language of your publisher's thesaurus. The syntax for the line is:

<db type="pt">path/to/publishers_thesaurus</>

Substitute the location of your publisher's thesaurus for "path/to/publishers_thesaurus" and save the file.

For example, to add a new thesaurus called newpt.dat to the list of English thesauri, change the qeconfig.dat file to look like this:

<config>
   <dbset>
      <db type="ise">en/ise.dat</>
      <db type="isecache">en/cache.dat</>
      <db type="isenoise">en/noise.dat</>
      <db type="ciet">en/thes.dat</>
      <db type="pt">en/newpt.dat</>
   </dbset>
</config>

Your thesaurus is now added to the searching path for DynaWeb. Try them out!


   The DynaWeb Thesaurus   [Table of Contents]    Exporting XML in DynaWeb