[Inso Home Page] [Home] [Collection] [Book] [Expand] [Collapse] [Search Forms] [Previous Section with Hits] [Next Section with Hits] [Clear Search] [Preferences] [Print] [Help]

 inside  Expand Search


   Localizing DynaWeb   [Table of Contents]    The DynaWeb Thesaurus

DynaWeb Publishers Guide

[-] 3. Overview of DynaWeb Configuration Files
[-] Query Expansion

Query Expansion

DynaWeb can expand queries submitted to the server in both book and collection searches. Publishers have the ability to disable any or all query expansion options. Query expansion consists of four main areas:

Note: When searching Japanese-language books, only the Stemming query enhancement option is applied. There is no normalization, thesaurus, or compounding of Japanese books at this time.

How DynaWeb handles Query Expansion

A checkbox has been added to the button bar at the collection level and the book level.

Figure 3-2: The Expand Search Checkbox

raster

If this box is selected, DynaWeb automatically expands the query using the current settings for query enhancement in the configuration files. Publishers can decide which options for query enhancement are important for them and turn off expansion options that they don't need.

DynaWeb returns a list of search hits containing the original term, as well as all expanded terms.

There are also new Tcl commands dealing with query expansion. These commands are detailed in the Extended Tcl Command Reference Guide.

Configurable Parameters

Publishers can set some of the query expansion parameters inside the DynaWeb configuration files, including the maximum number of terms to expand and the maximum number of expansion terms to return. These options are set inside the pubpref.dwc configuration file in the data/config directory. They are located under the section "Query Expansion Default Preferences".

The parameters that can be set by modifying the dwSetQueryExpOptions command are:

Name

Value

Default

Function

-Language

ISO 649 2-character language name

en

Language used to expand the query

-Dialect

"english", "american", etc.

None

Dialect of the specified query language

-MaxInputTermsToMatch

0-400

50

Maximum number of terms in the query to expand

-MaxExpTermsPerInputTerm

0-100

50

Maximum number of expansion terms to return for each term in the query

-MinSOA

0-100

50

Minimum "strength of association" for which to return expansion terms See "Strength of Association" for more details.

Enabling or Disabling Query Expansion

The pubpref.dwc file contains the controls for turning query expansion on and off. The pertinent section is shown below:

#Query Expansion Default Preferences
   Language                     EN
   EN_Thesaurus                 1
   EN_Stemming                  1
   EN_PubThesaurus              1
   EN_Compound                  1
   EN_Normalize                 1
   EN_MaxInputTermsToMatch      100
   EN_MaxExpTermsPerInputTerm   50
   EN_MinSOA                    50
   FR_Thesaurus                 1
   FR_Stemming                  1
...

All of the query expansion options are turned on by default in the DynaWeb configuration files. To turn them off, simply pick the option you wish to disable and replace the "1" with "0".

. The table below summarizes the query expansion options and their controlling parameter. Replace LANG with the two-letter string for the language you want.

Query Expansion Option

pubpref.dwc Trigger

Values

Default
"1" is on, "0" is off.

Stemming

LANG_Stemming

[1 | 0]

1

Text Normalization

LANG_Normalize

[1 | 0]

1

Thesaurus

LANG_Thesaurus

[1 | 0]

1
(0 for Japanese)

Word Compounding / Decompounding

LANG_Compound

[1 | 0]

1

Publisher's Thesaurus

LANG_PubThesaurus

[1 | 0]

1
(0 for Japanese)

Stemming

Stemming is the method of expanding a query to include the various forms of a word, including inflection, uninflection, derivation expansion, and derivation reduction.

Examples of queries that demonstrate the forms of stemming are as follows:

Word Compounding and De-compounding

De-compounding is the ability to decompose a compound word into its single word components and then find occurrences based on the smaller words. This feature does not distinguish between parts of speech.

For example, a search for the German word "leserhandbuch" finds "leser", "handbuch", "buch", "Benutzerhandbuch".

Text Normalization

Text normalization includes query processing for clitic stripping, open and closed ligatures, and expansion of hyphens, slashes, and parentheses.

Thesaurus Support

DynaWeb uses Inso's Concise International Electronic Thesaurus (CIET) technology to support thesaurus-based searching. This feature operates by expanding queries to find synonyms and other releated words. This substitution occurs transparently after the user executes the search.

Inso provides thesauri in English, French, Italian, German, and Spanish. There is no Japanese thesaurus at this time. Inso has additional European language thesauri available on special request.

The files that make up the thesauri for DynaWeb reside in the data/qe directory. Subdirectories hold the specific thesaurus files for each language. The possible languages are: English (en), French (fr), Italian (it), German (ge), Spanish (es), and Japanese (ja). If any of these directories are missing, you did not install all of the CIET locales during your DynaWeb installation.

If you wish to install more languages at a later date, rerun the DynaWeb installation into a different directory and copy the data/qe directory structure into the original installation. You will have to re-establish the locations of any Publisher's Thesauri you might have added.

Strength of Association

The concept of strength of association (SOA) is used to quantify the relationship between a query string and the expansion terms listed in the thesaurus. SOA can have a value from 0 (unrelated) to 100 (reserved for the original query string, direct stems, and synonyms.). In this manner, related terms can be ranked according to the closeness in similarity to the original term. For example, given an original query for "boat", some related terms in order of decreasing SOA might be: "ship", "sailboat", "trimaran", and "Sunfish". Since the term "ship" has the same meaning as "boat", it would have the strongest SOA. Likewise, since "Sunfish" is a brand of boat, it would be ranked lower than the other terms.

Order of Expansion

When DynaWeb expands a query with all of the expansion options enabled, it begins by checking the publisher-created thesaurus (if there is one)Then, it stems the query terms. When it has finished checking the stems, it uses the Inso-supplied thesauri. In some rare cases, query expansion may seem to "ignore" some thesaurus entries when the query term results in too many expanded terms. If this behavior happens, increase the value of the -MaxExpTermsPerInputTerm in the pubpref.dwc configuration file.


   Localizing DynaWeb   [Table of Contents]    The DynaWeb Thesaurus