Oracle8 ConText Cartridge Application Developer's Guide Release 2.3 A58164-01 |
|
This chapter describes the different query methods you can use in your ConText application. You can use these methods with text queries and theme queries. The following topics are covered:
Each of the query methods (two-step, one-step, and in-memory) provide advantages and disadvantages that you must consider when developing an application. The following table briefly describes each method and illustrates the various advantages and disadvantages to using each:
To perform a two-step query, do the following:
The document textkeys and scores are stored in the specified result table.
The following example shows a simple two-step query. The query uses a policy named ARTICLES_POL to search the text column in a table named TEXTTAB for any articles that contain the word petroleum. The CONTAINS procedure populates the CTX_TEMP results table with the document primary keys that satisfy the query.
The select statement then joins the results in CTX_TEMP with TEXTAB to create a list of document titles ordered by score.
Note that before the two-step query example is executed, the result table, CTX_TEMP, is created:
create table CTX_TEMP( textkey varchar2(64), score number, conid number); execute ctx_query.contains('ARTICLE_POLICY','petroleum','CTX_TEMP') SELECT SCORE, title FROM CTX_TEMP, TEXTTAB WHERE texttab.PK=ctx_temp.textkey ORDER BY SCORE DESC;
In this example, the articles with the highest scores appear first in the hitlist because the results are sorted by score in descending order.
In a two-step query, the score results generated by the CONTAINS procedure are physically stored in a result table that has been allocated (either by the application developer or dynamically within the application).
If you want to include scores in the hitlist returned by a two-step query, select the from the result table in the second step of the query.
.
Note: The way in which ConText calculates a relevance score for text queries is different than the way it calculates scores for theme queries. To learn more about how ConText calculates relevance score for text queries, see "Scoring" in Chapter 1. To learn more about how ConText calculates relevance scores for theme queries, see "Understanding Theme Queries" in Chapter 4. |
In two-step queries, ConText uses result tables called hitlist tables to store intermediate results. Intermediate results can be merged into the standard SQL query through a join operation or a sub-query operation. The result tables must be created before the query is performed. A hitlist table can be created manually or allocated through the CTX_QUERY.GETTAB procedure.
Hitlist tables can be named anything; however, they must have the following structure:
See Also:
For more information about the structure of the hitlist result tables, see "Hitlist Table Structure" in Appendix A. |
For applications that support multiple concurrent users, ConText allows for sharing a single result table among all the users rather than allocating a separate table for each user.
You control sharing of result tables with the sharelevel and the query_id parameters of the CTX_QUERY.CONTAINS procedure. If the result table is shared, the CONTAINS procedure must specify that sharelevel is equal to one and include a unique query_id so that each result can be distinguished from others in the result table.
When sharelevel is equal to 0:
When sharelevel is equal to 1 then:
When you execute a two-step query on a table with a composite textkey, the number of textkey columns in the result table must match the composite keys count in the document table. For example, if you want to execute a query on a document table that has a two-column textkey, create a result table with the following schema: TEXTKEY, TEXTKEY2, SCORE, CONID.
The following SQL*Plus examples show two different ways in which to create a result table with a two-column composite textkey:
/* create composite textkey result table manually */ create table ctx_temp( textkey varchar2(64), textkey2 varchar2(64), score number, conid number); /* allocate composite textkey result table with CTX_QUERY.GETTAB() */ exec ctx_query.gettab(CTX_QUERY.HITTAB, :hit_tab, 2)
See Also: For more information on the structure of composite textkey result tables, see "Composite Textkey Hitlist Tables" in Appendix A.
There is an alternative to the second step of a two-step query. Rather than joining the result table and text table in a SELECT statement, you can create a view to perform the join. Then use a SELECT statement to select the appropriate rows from that view. Use this approach when the development tool does not allow tables to be joined in a SELECT statement (e.g. Oracle Forms).
For example:
CREATE VIEW SURVEY AS SELECT * FROM TEXTTAB, CTX_TEMP WHERE PK = TEXTKEY;
SELECT SCORE, AUTHOR FROM SURVEY ORDER BY SCORE DESC;
In this example:
To execute a two-step query on a table with a composite textkey, you first specify the multiple textkey columns when you create the policy for the text column.
See Also:
For more information about creating policies for composite textkey tables, see Oracle8 ConText Cartridge Administrator's Guide. |
In addition, before the two-step query, create a result table in which the number of TEXTKEY columns match the number of columns in the composite textkey in the document table. You can create the result table manually or using the CTX_QUERY.GETTAB procedure.
See Also:
For more information on the structure of composite textkey result tables, see "Composite Textkey Hitlist Tables" in Appendix A. |
For example, to create a result table manually with a composite textkey consisting of two columns, issue the following SQL statement:
create table CTX_TEMP2( textkey varchar2(64), textkey2 varchar2(64), score number, conid number);
In the two-step query, use the AND operator in the WHERE condition when you join the result and text tables. For example:
exec ctx_query.contains('ARTICLE2_POLICY','petroleum','CTX_TEMP2') SELECT SCORE, title FROM CTX_TEMP2, TEXTTAB2 WHERE texttab2.PK=ctx_temp2.textkey AND texttab2.PK2=ctx_temp2.textkey2 ORDER BY SCORE DESC;
A structured query is a query based on a text column and a structured data column. The structured data column is usually in the same table as the text column. For example, you might use a structured query to retrieve documents on a certain subject that were written after a certain date, where the document content is in a text column and date information is in a structured data column.
The CTX_QUERY.CONTAINS procedure provides an additional parameter, struct_query, for specifying the WHERE condition in a structured query. For example, to select all news articles that contain the word Oracle that were written on or after October 1st, 1996, you might use:
exec ctx_query.contains('news_text','Oracle','res_tab', struct_query => 'issue_date >= (''1-OCT-1996'')')
Executing a structured query with the struct_query parameter improves performance over processing a query on a text column and then refining the hitlist by applying a where condition against a structured column. This is especially so when the selectivity of the WHERE condition is high, because when you use the structured query parameter, the ConText server executes the entire query without first writing out a potentially large hitlist to be refined later by the Oracle server.
Note: If the user who includes a structured query in a two-step query is not the owner of the table containing the structured and text columns, the user must have SELECT privilege with GRANT OPTION on the table. In addition, if the object being queried is a view, the user must have SELECT privilege with GRANT OPTION on the base table for the view. SELECT privilege with GRANT OPTION can be granted to a user using the GRANT command in SQL. For more information, see Oracle8 Server SQL Reference. |
If a database link has been created for a remote database, two-step queries support querying text columns in the remote database.
Note: Database links are created using the CREATE DATABASE LINK command in SQL. For more information about creating database links, see Oracle8 Server SQL Reference. |
To perform a two-step query for a text column in a remote database, specify the database link for the remote database in the CONTAINS procedure as part of the policy for the column in the remote database.
In addition, the result table specified in CONTAINS must exist in the remote database, and you, the user performing the query, must have the appropriate privileges on the result table.
For example:
exec ctx_query.contains('MY_POL@DB1', 'petroleum','CTX_TEMP')
In this example, MY_POL exists in a remote database identified by the database link DB1. The CTX_TEMP result table exists in the same remote database.
See Also:
For more information about remote queries and distributed databases, see Oracle8 Server Concepts. |
The CONTAINS procedure provides an argument for processing two-step queries in parallel. Processing queries in parallel helps balance the load between ConText servers and might improve query performance.
When the CONTAINS procedure is called in a two-step query, the PARALLEL argument can be used to specify the number of ConText servers, up to the total number of ConText servers running with the Query personality, that are used to process two-step queries and write the results to the result table.
For example:
exec ctx_query.contains('ARTICLE_POLICY','petroleum', 'CTX_TEMP', parallel=>2)
In this example, the text column in the ARTICLE_POLICY policy is queried for documents that contain the term petroleum. The query is processed in parallel by any two available ConText servers with the Query personality and the results are written to CTX_TEMP.
The one-step query uses the CONTAINS and SCORE functions in a SQL statement to execute a user's request for documents. Rows and columns containing the text and structured data for relevant documents are returned to the application program as a record set like any other query in SQL.
.
Note: Before one-step queries can be executed, the database in which the text resides must be text enabled by setting the ConText initialization parameter TEXT_ENABLE = TRUE. This can be done by either setting it in the initsid.ora system initialization file, or by using the ALTER SESSION command. For more information about initialization parameters and the initsid.ora file, see Oracle8 Server Administrator's Guide. For more information about using the ALTER SESSION command, see Oracle8 Server SQL Reference. |
After a user has submitted a one-step query, ConText performs the following tasks to return the results to the user:
The following SELECT statement shows a simple one-step query. This query searches a text table called TEXTTAB for any articles that contain the word petroleum.
SELECT * FROM texttab WHERE CONTAINS (text, 'petroleum') > 0;
Because ConText functions execute within normal SQL statements, all of the capabilities for selecting and querying normal structured data fields, as well as text, are available. For instance, in the example, if the text table had a column listing the date the article was published, the user could select articles based on that date as well as the content of the text column.
.
Note: The asterisk wildcard character ( *) in specifies that the record set returned by the query includes all the columns of the text table for the selected documents, as well as the scores generated for each document. If a query has more than one CONTAINS function, the asterisk wildcard does not return scores for the multiple CONTAINS and the SCORE function must be called explicitly. See "Scoring" in this chapter for an example. |
One-step queries support calling more than one CONTAINS functions in the WHERE clause of a SELEC statement. Multiple CONTAINS can be used in a one-step query to perform queries on multiple text columns located either in the same table or in separate tables.
If multiple ConText servers with the Query personality are running and a one-step query with multiple CONTAINS is executed, the query is processed in parallel. Each CONTAINS function is evaluated by one of the available ConText servers and the results from the servers are combined before they are returned to the user.
In a one-step query, the document scores are generated by the CONTAINS function and returned by the SCORE function.
Each CONTAINS function in a query produces a separate score. When there are multiple CONTAINS functions, each CONTAINS function must have a label (a number) so the SCORE value can be identified in other clauses of the SELECT statement.
The SCORE function may be used in a SELECT list, an ORDER BY clause or a GROUP BY clause.
For example:
SELECT SCORE (10), SCORE(20), title FROM DOCUMENTS WHERE CONTAINS (TEXT, 'holmes', 10) > 0 OR CONTAINS (TEXT, 'moriarty', 20) > 0 OR CONTAINS (TEXT, 'baker street', 30) > 0 ORDER BY SCORE(10) GROUP BY SCORE(30)
Note:The way in which ConText calculates a relevance score for text queries is different than the way it calculates scores for theme queries.
To learn more about how ConText calculates relevance score for text queries, see "Scoring" in Chapter 1.
To learn more about how ConText calculates relevance scores for theme queries, see "Understanding Theme Queries" in Chapter 4.
The CONTAINS function can only appear in the WHERE clause of a SELECT statement.
You cannot issue the CONTAINS function in the WHERE clause of an UPDATE, INSERT or DELETE statement.
For a text column that has more than one policy associated with it, you must specify which policy to use in the CONTAINS clause using the pol_hint parameter.
You might create two policies for a column when you want to perform both theme and text queries on the column, or in any application where you build two separate indexes for a text column.
See Also:
For more information on issuing one-step queries with multiple policies see "Theme Query Examples" in Chapter 4. To learn more about using the pol_hint parameter, see the specification for the SELECT Statement in Chapter 9. |
You can perform one-step queries on text tables with composite textkeys. The syntax for the query is the same as the syntax for a query on a table with a single-column textkey.
If a database link has been created for a remote database, one-step queries support querying text columns in the remote database.
To perform a one-step query for a text column in a remote database, the database link for the remote database is specified as part of the table name in the SELECT clause.
For example:
SELECT * FROM texttab@db1 WHERE CONTAINS (text, 'petroleum') > 0;
In this example, texttab exists in a remote database identified by the database link DB1
Note: One-step queries do not support querying LONG and LONG RAW columns in remote database tables. For more information about creating database links, see Oracle8 Server SQL Reference. For more information about remote queries and distributed databases, see Oracle8 Server Concepts. |
In-memory queries use a buffer and a cursor to return query results. Returning query results to a buffer in memory improves performance over writing and reading query results to and from database result tables, which is typical of one- and two-step queries.
To perform an in-memory query, do the following:
In addition, you can specify that OPEN_CON return additional columns (up to five) for the selected documents from the text table.
The following example shows a simple in-memory query. This query uses a policy named ARTICLES_POL to search the text column in a table named TEXTTAB for any articles that contain the word petroleum.
declare score char(5); pk char(5); curid number; title char(256); begin dbms_output.enable(100000); curid := ctx_query.open_con( policy_name => 'ARTICLES_POL', text_query => 'petroleum', score_sorted => true, other_cols => 'title'); while (ctx_query.fetch_hit(curid, pk, score, title)>0) loop dbms_output.put_line(score||pk||substr(title,1,50)); end loop; ctx_query.close_con(curid); end;
In this example, the TITLE column from the table is also returned by OPEN_CON, so a variable must be declared for TITLE.
DBMS_OUTPUT.ENABLE sets the buffer size to the maximum of 100000 bytes (1 Mb) to ensure that the buffer is large enough to hold the results of the query.
The SCORE_SORTED argument in OPEN_CON is set to true which causes OPEN_CON to store the hits in the query buffer in descending order by score.
FETCH_HIT is called in a loop to fetch SCORE, PK, and TITLE for each hit until a value less than zero is returned, indicating that the buffer is empty.
DBMS_OUTPUT.PUT_LINE prints the results to the standard output.
See Also:
For more information about the DBMS_OUTPUT PL/SQL package, see Oracle8 Server Application Developer's Guide. |
You can perform in-memory queries on text tables that have multiple column textkeys. When you use CTX_QUERY.FETCH_HIT to retrieve each hit from the buffer, the PK argument is returned as an encoded string. To access an individual textkey, you must use CTX_QUERY.PKDECODE.
In-memory queries have the following limitation:
You cannot use the max and first/next operators with in-memory queries.
If a database link has been created for a remote database, in-memory queries support querying text columns in the remote database.
Note: Database links are created using the CREATE DATABASE LINK command in SQL. For more information about creating database links, see Oracle8 Server SQL Reference. |
To perform an in-memory query for a text column in a remote database, the database link for the remote database is specified in the CTX_QUERY.OPEN_CON procedure as part of the policy for the column in the remote database.
In addition, the result table specified in CTX_QUERY.CONTAINS must exist in the remote database and the user performing the query must have the appropriate privileges on the result table.
See Also:
For more information about remote queries and distributed databases, see Oracle8 Server Concepts. |
In addition to two-step, one-step, and in-memory queries, you can count the number of hits in a two-step or in-memory query. The documents can be stored in a local or remote database. Counting query hits helps to audit queries to ensure large and unmanageable hitlists are not returned.
You can count the number of hits before or after you issue the query using one of the following functions:
Before you issue a two-step or in-memory query, you can use the CTX_QUERY.COUNT_HITS function to return the number of hits for the query without generating scores for the hits or returning the textkeys for the documents.
COUNT_HITS can be called in two modes, estimate and exact. The results in estimate mode may be inaccurate; however, the results are generally returned faster than in exact mode
.You can use the CTX.QUERY.COUNT_LAST function to obtain the number of hits in a two-step query and in-memory query after issuing CONTAINS or OPEN_CON.
COUNT_LAST returns the number of hits obtained from the last call to CTX_QUERY.CONTAINS or CTX_QUERY.OPEN_CON.
For two-step queries, the time it takes to issue the query with CONTAINS and then to call COUNT_LAST is not as fast as calling COUNT_HITS before the query. However, in the case where you need to process all hits in a two-step query, issuing the query with CONTAINS and then calling COUNT_LAST is more efficient than calling COUNT_HITS and then calling CONTAINS.
With in-memory queries, issuing OPEN_CON and then calling COUNT_LAST is always a more efficient way to obtain an estimate of the query hits over calling COUNT_HITS and then calling OPEN_CON, since COUNT_LAST returns a number faster than COUNT_HITS.