Oracle8 ConText Cartridge Application Developer's Guide Release 2.3 A58164-01 |
|
This chapter describes query expression feedback. The following topics are covered:
Query expression feedback is a feature that enables you to know how ConText parses a text or theme query expression before you execute the query. Knowing how ConText evaluates a text or theme query expression is useful for refining and debugging queries. You can also design your application so that it uses the feedback information to help users write better queries.
The diagram above shows how you use query expression feedback. You execute the PL/SQL procedure CTX_QUERY.FEEDBACK, which generates and stores feedback information to a table. From the data in this feedback table, you can visualize the ConText parse tree to examine how the expression was expanded and parsed. You can then refine the query and re-execute FEEDBACK, or you can execute the real query with CONTAINS for two-step queries, OPEN_CON for in-memory queries, or SELECT for one-step queries.
In text queries, query expression feedback is especially useful for knowing how context expands expressions that contain stem, wildcard, thesaurus, fuzzy, soundex, PL/SQL, or SQE operators before you execute the query. This is because such queries can potentially expand into many tokens or result in very large hitlists, causing much overhead.
In theme queries, query expression feedback is useful for knowing how ConText uses the knowledge catalog to normalize query expressions.
Before ConText executes a query, it parses the expression. The resulting expression can be represented as a parse tree. A ConText parse tree can show:
The output table of the FEEDBACK procedure is graphical representation of a ConText parse tree.
Parse trees are read in a depth-first manner and from left to right. This means the first operation is always furthest to the left and at the bottom of the branch. In this way, parse trees illustrate operator precedence.
The example above shows the parse tree for the evaluation of a AND b OR c, where a, b and c stand for three arbitrary words. Since the and operation a AND b is the leftmost operation and at the bottom of the tree, it is executed first. In this way, the parse tree above indicates correctly that the and operator has higher precedence over the or operator. The resulting query is hence (a AND b) OR c rather than a AND (b OR c).
The above example shows how ConText expands the query comp% OR ?smith. The parse tree shows that before ConText executes the query, the token comp% is expanded to computer and comptroller, while ?smith is expanded to smith and smythe.
ConText parse trees show similar expansions with thesaurus, wildcard, soundex, stem, SQE, and PL/SQL operators. In the case of the wildcard, soundex, and fuzzy operators, ConText obtains the correct word expansions from the index.
Note: When you include the SQE operator in the feedback expression, the feedback (expansion of the stored query expression) is based on the current state of the index and will take into account any inserts, updates, or deletes made to the base table; however, unlike a call to CONTAINS, the stored query expression is not updated or refreshed as a result of the call to FEEDBACK. |
You can use query expression feedback to know how ConText interprets theme queries. The feedback information provides the normalized version of the query as obtained from the knowledge catalog.
The example above shows how ConText normalizes the theme query ratified laws to the themes ratification and law. The resulting expression is an AND operation with weights attached to the normal forms: ratification*0.561 AND law*0.438.
The example above shows how ConText optimizes the expression a AND b AND c, where a and b and c stand for three different words.
In the first step of the parse, ConText evaluates a AND b, then ANDs the result with c. With such a parse tree, ConText must search for all documents that contain a and b, then search for all documents that contain c, and then intersect the two result sets.
The ConText optimizer realizes this query is more efficiently executed by simultaneously searching for all the documents that contain a and b and c, which is illustrated in the second step of the optimizing process.
The example above shows the parse sequence for the stopword transformation:
non_stopword NOT stopword => non_stopword
Assuming that is a stopword, ConText reduces the query dog NOT that to dog.
See Also:
To learn more about querying with stopwords, see "Querying with Stopwords" in Chapter 3. For a list of all possible stopword transformations, see Appendix C, "Stopword Transformations". |
Before you issue a query, you can obtain the parse tree information for the query expression. The procedure CTX_QUERY.FEEDBACK creates a graphical representation of the parse tree and stores this information in a feedback table, which you create before executing CTX_QUERY.FEEDBACK. To reconstruct ConText parse trees, you must understand the structure of this table.
The feedback table has the following structure:
Column Name | Datatype | Description |
---|---|---|
FEEDBACK_ID |
VARCHAR2(30) |
The value of the feedback_id argument specified in the FEEDBACK call. |
ID |
NUMBER |
A number assigned to each node in the query execution tree. The root operation node has ID =1. The nodes are numbered in a top-down, left-first manner as they appear in the parse tree. |
PARENT_ID |
NUMBER |
The ID of the execution step that operates on the output of the ID step. Graphically, this is the parent node in the query execution tree. The root operation node (ID =1) has PARENT_ID = 0. |
OPERATION |
VARCHAR2(30) |
Name of the internal operation performed. Refer to Table 5-2 for possible values. |
OPTIONS |
VARCHAR2(30) |
Characters that describe a variation on the operation described in the OPERATION column. When an OPERATION has more than one OPTIONS associated with it, OPTIONS values are concatenated in the order of processing. See Table 5-3 for possible values. |
OBJECT_NAME |
VARCHAR2(64) |
Section name, or wildcard term, or term to lookup in the index. |
POSITION |
NUMBER |
The order of processing for nodes that all have the same PARENT_ID.The positions are numbered in ascending order starting at 1. |
CARDINALITY |
NUMBER |
Reserved for future use. You should create this column for forward compatibility. |
Table 5-2 lists the possible values for the OPERATION column in the feedback table:
Table 5-3 shows the values for the OPTIONS column in the feedback table. When an OPERATION has more than one OPTIONS associated with it, the OPTIONS values are concatenated in the order of processing
Options Value | Description |
---|---|
($) |
Stem |
(?) |
Fuzzy |
(!) |
Soundex |
(n) |
A number associated with threshold, weight, or max |
(m-n) |
First next range (m and n are integers) |
The figure above shows how ConText encodes the parse tree for the query comp% OR $smith, which is asking for all documents that contain words beginning with comp or contain words that are spelled like smith.
Each node is labeled with a value that corresponds to the OPERATION column in the feedback table. The tree above contains one OR node, two EQUIVALENCE nodes, and four WORD nodes.
The ID and PARENT_ID values are listed beside each node. For example, the OR node has an ID of 1 and PARENT_ID of 0, since it is the root node.
The EQUIVALENCE node with ID = 2, PARENT_ID = 1, has an OBJECT_NAME value of COMP%, because this equivalence operation is a result of wildcard term comp%.
The WORD node with id = 3 has an OBJECT_NAME value of computer, because in this instance, computer is one of the words that satisfy comp%.
To obtain query expression feedback information, you must do the following:
To create a feedback table called test_feedback for example, use the following SQL statement:
create table test_feedback( feedback_id varchar2(30) id number, parent_id number, operation varchar2(30), options varchar2(30), object_name varchar2(64), position number, cardinality number);
To obtain the expansion of a query expression such as comp% OR ?smith, use CTX_QUERY.FEEDBACK as follows:
ctx_query.feedback( policy_name => 'scott.test_policy', text_query => 'comp% OR ?smith', feedback_table => 'test_feedback', sharelevel => 0, feedback_id => 'Test');
To read the feedback table, you can select the columns as follows:
select feedback_id, id, parent_id, operation, options, object_name, position from test_feedback order by id;
The output is ordered by ID to simulate a hierarchical query:
FEEDBACK_ID ID PARENT_ID OPERATION OPTIONS OBJECT_NAME POSITION ----------- ---- --------- ------------ ------- ----------- -------- Test 1 0 OR NULL NULL 1 Test 2 1 EQUIVALENCE NULL COMP% 1 Test 3 2 WORD NULL COMPTROLLER 1 Test 4 2 WORD NULL COMPUTER 2 Test 5 1 EQUIVALENCE (?) SMITH 2 Test 6 5 WORD NULL SMITH 1 Test 7 5 WORD NULL SMYTHE 2
You can optionally construct an approximate graphical representation of the parse tree using a hierarchical query. This type of query outputs rows in a hierarchical manner, where children nodes are indented under parent nodes.
The following statement selects from a populated feedback table, indenting the output according to level:
select lpad(' ',2*(level-1)) || operation operation, options, object_name, position from test_feedback start with id = 1 connect by prior id = parent_id;
This statement produces hierarchical output for the query comp% OR ?smith as follows:
OPERATION OPTIONS OBJECT_NAME POSITION -------------------- ---------- -------------------- -------- OR NULL NULL 1 EQUIVALENCE NULL COMP% 1 WORD NULL COMPTROLLER 1 WORD NULL COMPUTER 2 EQUIVALENCE (?) SMITH 2 WORD NULL SMITH 1 WORD NULL SMYTHE 2