Oracle8 ConText Cartridge Workbench User's Guide Release 2.3 A57700-01 |
|
This chapter describes how to use the TextServer3 Dictionary Migration Tool.
The topics covered in this chapter are:
The TextServer3 Dictionary Migration Tool is for users who need to migrate from Oracle TextServer3 to Oracle8 ConText Cartridge. The tool will:
The Migration Tool is a PL/SQL package, CTXWMG_MIGRATE, that is installed on the same database as ConText. It queries the TextServer3 dictionary to find as much useful information as possible that can be used in the creation of column policies. This is done using a database link to the database where the TextServer3 text dictionary resides. It also uses the database link to access any TextServer3 thesauri or section groups that are to be migrated.
The PL/SQL package that implements the tool makes use of the UTL_FILE package. This package adds File I/O capabilities to PL/SQL in Oracle8 Server, Release 8.0.
The Migration Tool is intended for use by application developers or TextServer3/ConText administrators. It is assumed that the user has an understanding of the following:
Familiarity with the Oracle8 Server IMP and EXP utilities is also assumed.
All references to migration refer to the TextServer3/ConText data dictionary migration process. All references to the Migration Tool refer to the TextServer3/ConText Dictionary Migration Tool.
See Also:
For more information about policies, preferences, and attributes, see Oracle8 ConText Cartridge Administrator's Guide |
Although TextServer3 and ConText share similar functionality, there are differences that prevent the automatic migration of text tables, thesauri and section groups. The user or application developer needs to make decisions about preferences and policies using their knowledge about the text application.
For this reason, the Migration Tool does not directly create column policies and thesauri in the ConText data dictionary. Instead, it generates:
The generated scripts are edited by the user in order to customize the migration to suit the needs of the application. The user needs to alter the creation of some preferences because of differences in ConText functionality (eg. filters supported internally) or to take advantage of new functionality.
The Migration Tool is aimed at migrating from any version of TextServer3 to Oracle8 ConText Cartridge 2.3.6. The database used by each product must be accessible to the user carrying out the migration.
The table that is a TextServer3 text table must exist on the target database (where ConText is installed). It need not be populated, but must have a column that has a unique or primary key constraint on it - this column must be the TextServer3 textkey column. The exp/imp utilities can be used for this purpose, or the TextServer3 database can be upgraded to 8.0.4.
ConText 2.3.6 must be installed on a 8.0.4 database. The user who uses the Migration Tool to generate scripts and load files need not be the user who ultimately owns the column policies and thesauri. The Migration Tool user must have privileges that allow the creation and dropping of database links for each TextServer3 user whose text tables and thesauri are to be migrated.
TextServer3 has areas of functionality that are available in ConText, but the ConText functionality does not exactly parallel the TextServer3 functionality. One of these areas is Stored Query Expressions (SQEs). In TextServer3, an SQE can be a query against multiple text columns. These text columns do not have to be in the same text table. In addition, the query also supports the full 'SELECT ... FROM ... WHERE ...' clauses in a SQL statement.
The ConText implementation binds an SQE to a single column policy and specifies only the query expression. For these reasons, no attempt is made to migrate TextServer3 SQEs to ConText.
Another area of TextServer3 functionality that differs from the ConText implementation is the thesaurus. A term in a TextServer3 thesaurus or synonym ring can be a proper term, a reference to another thesaurus or synonym ring, or a reference to an SQE. In ConText, a term can only be a proper term or phrase. When doing the thesaurus migration, the user has control over what happens when a term is a thesaurus or synonym ring, but no attempt to process a term is made when that term is an SQE.
If section groups are to be migrated, the section groups must be created in the ConText data dictionary before policy creation and indexing is carried out.
This section provides details for installing the Migration Tool.
During installation of the ConText Workbench, the installation scripts for the CTXWMG_MIGRATE package are copied to the client machine and an icon is created for executing the scripts; however the scripts are not executed to create the package.
Installing the Migration Tool involves creating the CTXWMG_MIGRATE package in the database on the server machine.
Before installing the Migration Tool, you must perform the following tasks:
The Migration Tool, once installed on the ConText database, can generate files for the tables and thesauri that are to be migrated. These files are owned by the owner of the Oracle8 shadow process. Therefore, choose a directory that can be accessed by the Oracle Server user who runs SQL*Plus and ctxload. This is because the generated files may need to be customized.
Accessible directories must be specified in the INIT.ORA file for the 8.0.4 instance. Each accessible directory is specified by the following parameter:
UTL_FILE_DIR = <directory name> UTL_FILE_DIR = <directory name> ...
The named director(ies) are then available to the PL/SQL File I/O package.
You may require that the names of database links are different from the name of the database where they are linking. To enforce this, add the following parameter to INIT.ORA:
GLOBAL_NAMES = FALSE
The database instance where ConText is installed must be restarted so that the UTL_FILE_DIR and GLOBAL_NAMES parameters can take effect.
The Migration Tool uses a database link to access the TextServer3 dictionary. The link is called ts3db.
For every TextServer3 user that owns a text table or thesaurus to be migrated, the link must be dropped, and then recreated.
Creating a Link
Links should be created by the Oracle user that runs the Migration Tool. From SQL*Plus, use the following syntax:
SQL> create database link ts3db connect to <user> 2 identified by <pw> using '<connect_string>';
Dropping a Link
When the database link must be created for another TextServer3 user, drop the existing ts3db link:
SQL> drop database link ts3db;
To install the CTXWMG_MIGRATE package, double-click on the TextServer3 Migration Tool Installation icon.
During text table migration, a script is generated that creates, as closely as possible, a policy that reflects an indexable column in a TextServer3 text table. The user must specify a prefix for the naming of the preferences in the policy. If a TextServer3 text table contains multiple indexable columns, then a script is created for each column.
ConText has a concept of preferences and policies. A preference specifies one of the indexing options that are necessary for creating a text index for a column:
The preferences are grouped into a policy and the policy is assigned to a column. Then, a text index is created for the column policy.
A preference consists of a Tile, which identifies the indexing option for the preference, and a number of attributes for the Tile. The attributes are set for a Tile using the CTX_DDL.SET_ATTRIBUTE procedure.
See Also:
For more information about policies, preferences, and attributes, see Oracle8 ConText Cartridge Administrator's Guide |
A brief outline of how the Migration Tool handles each type of preference is given here.
If the TextServer3 column is EXTERNAL, then the OSFILE Tile is used to create a Data Store preference. If an External File Path for EXTERNAL columns is specified, then the path attribute of OSFILE is set appropriately. Otherwise, the predefined CTXSYS.DEFAULT_DIRECT_DATASTORE preference is used.
If the TextServer3 text table contains documents of a single format supported internally by ConText, then the FILTER attribute is set appropriately. If multiple formats are used, that are all supported internally by ConText, the predefined CTXSYS.AUTOB preference is used. However, if unsupported formats are used, the tool generates a commented-out list of CTX_DDL.SET_ATTRIBUTE calls, for the executable attribute, to match the formats that are recognized by Autorec. The user should then uncomment the appropriate ones and edit the executable name.
The BASIC LEXER Tile is used to create the Lexer preference. If the TextServer3 text column has column-level options defined, then these are used when setting lexer attributes, otherwise table-level options are used. The options map as follows:
TextServer3 Option | BASIC LEXER Tile Attribute(s) |
---|---|
Alpha Join |
printjoins |
Numeric Join |
numjoin, numgroup |
Continuation |
continuation |
case Conversion = BASE |
BASE_LETTER set to 1 |
If a particular TextServer3 option is not set, then the equivalent BASIC LEXER attribute value is not set.
Additionally, if the TextServer3 text column references a section group, then the following two BASIC LEXER Tile attributes are set:
Attribute Name | Value(s) |
---|---|
startjoin |
'<' and '/' |
endjoin |
'>' |
The user may wish to edit these.
The GENERIC ENGINE Tile is used to create the Engine preference. The resulting script contains a call to CTX_DDL.SET_ATTRIBUTE for each attribute of GENERIC ENGINE, but they are commented out. The values need to be edited by the user as appropriate, and the line uncommented.
The GENERIC WORDLIST tile is used to create the Wordlist preference. For the stclause and instclause attributes, a commented-out call to CTX_DDL.SET_ATTRIBUTE() exists.
stemmer is set to 1 (English) and fuzzy_match is set to 1 (English).
If the TextServer3 text column references a section group, then the section_group attribute is set appropriately.
If there are no stop words in the appropriate TextServer3 wordlist, then the predefined CTXSYS.NO_STOPLIST preference is used.
If there are less than 4096 lowercase stop words in the TextServer3 table wordlist with a length of less than 65 characters, then these words are used to make up a stoplist.
If there are more than 4096 stop words, the CTXSYS.DEFAULT_STOPLIST preference is used.
The generation of a script is straightforward after the tool has been installed and the UTL_FILE_DIR initialization parameter has been set.
The user needs to decide which table to migrate and what the prefix for the preference names are going to be. Then, the user calls the CTXWMG_MIGRATE.MIGRATE procedure to create the necessary scripts for the migration.
The preference names that the tool generates in the scripts are a concatenation of the prefix supplied by the user, the name of the preference category, and the name of the column for which each script is created.
The generated scripts contain calls to CTX_DDL.SET_ATTRIBUTE that have been commented out with the SQL*Plus command rem. The user must examine the script to determine whether such calls need uncommenting. The comments are used for attributes where the Migration Tool is unable to determine a reasonable value.
An example of this is for the storage parameters used by various tables and indexes in the Engine preference.
This example uses the MIGRATE procedure to create column policy scripts.
In this example, a TextServer3 table, owned by jbloggs, is called resume and has two indexable columns called cv and address. The directory that has been chosen as the target directory for the scripts is called /usr/home/joebloggs/migrate. The prefix convention chosen is RESUME_APP. A database link called ts3db has been created for jbloggs.
To generate scripts, the procedure MIGRATE is called:
begin ctxwmg_migrate.migrate (tabname=>'RESUME', pref_conv=>'RESUME_APP', scriptdir=>'/usr/home/joebloggs/migrate'); end;
From this example, two scripts would be generated, one for each of the indexable columns in resume:
The migration of TextServer3 thesauri and synonym rings to ConText is not as straightforward as the migration of TextServer3 tables. This is because in ConText each term in a thesaurus must be a phrase. In TextServer3, narrower terms can be another thesaurus or a synonym ring. A related term can also be a thesaurus or synonym ring. This means that the functionality of a TextServer3 thesaurus is determined just as much by its structure and its relationship with other thesauri/synonym rings, as by the query operators made available.
Another difference is that a thesaurus in TextServer3 can contain only one hierarchy. In ConText, a thesaurus can contain multiple hierarchies. A ConText thesaurus can also contain synonym rings. The load files create Standard Narrower Term hierarchies.
A term in a ConText synonym ring must be unique to that synonym ring within a particular thesaurus. All synonym ring terms are given a qualifier so that conflicts do not arise. The qualifier is the name of the TextServer3 synonym ring.
The Migration Tool provides four procedures for use in the migration of thesauri and synonym rings. Each of these procedures generates a load file that can be used with ctxload. The ctxload utility can be used for loading thesauri into the ConText dictionary.
See Also:
For more information about thesauri and synonym rings, see Oracle8 ConText Cartridge Administrator's Guide |
Section groups can be migrated from TextServer3 to ConText. The generated script should be run from SQL*Plus to create the section group and add sections to it.
See Also:
For more information about sections and section groups, see Oracle8 ConText Cartridge Administrator's Guide |
The CTXWMG_MIGRATE PL/SQL package contains the following stored procedures:
The MIGRATE procedure creates a SQL script for a TextServer3 text table. When the script is run, it generates a ConText policy for an existing table in a ConText database.
CTXWMG_MIGRATE.MIGRATE_(tabname IN VARCHAR2, pref_conv IN VARCHAR2, scriptdir IN VARCHAR2);
Specify the name of the TextServer3 text table that contains one or more indexable columns for which you want to create a ConText policy.
Specify the string of characters used to generate a prefix for the preference names in the script.
Specify the directory for the script generated by MIGRATE.
The preference names that the tool generates in the script are a concatenation of the prefix supplied by the user, the name of the preference category, and the name of the column for which the script is being created.
For example, if the tool is creating a script for the doc column, and the user specifies a prefix of myapp, then the Engine preference is called myapp_engine_doc.
If the preference is being created for an internal filter, then the name of the preference category is substituted with four letters representing the format (e.g. WP51 for WordPerfect 5.1).
The MIGRATE_ALL_SYNS procedure creates a single load file for all the synonym rings owned by the user.
CTXWMG_MIGRATE.MIGRATE_ALL_SYNS(loaddir IN VARCHAR2);
Specify the directory for load file generated by MIGRATE_ALL_SYNS.
No 'in-place' expansion occurs.
When the load file generated by MIGRATE_ALL_SYNS is used to generate a ConText thesaurus, all the synonym rings from TextServer3 are created within one ConText thesaurus.
The name of the generated load file is based on the user ID for the TextServer3 user. For example, for a user with a TextServer3 user ID of 3, the generated file is called ALL_SYNONYMS_3.syn.
The MIGRATE_ALL_THES procedure creates a single load file for all the thesauri owned by the user.
CTXWMG_MIGRATE.MIGRATE_ALL_THES(loaddir IN VARCHAR2);
Specify the directory for load file generated by MIGRATE_ALL_THES.
No 'in-place' expansion occurs.
When the load file generated by MIGRATE_ALL_THES is used to generate a ConText thesaurus, all the hierarchies from the TextServer3 thesauri are created within one ConText thesaurus.
The name of the generated load file is based on the user ID for the TextServer3 user. For example, for a user with a TextServer3 user ID of 3, the generated file is called ALL_THESAURI_3.ths.
The MIGRATE_SG procedure generates a script for a named section group.
CTXWMG_MIGRATE.MIGRATE_SG(group_name IN VARCHAR2, loaddir IN VARCHAR2);
Specify the name of the section group to be migrated.
Specify the directory for script generated by MIGRATE_SG.
The name of the generated script is based on the name of the TextServer3 section group. For example, for a TextServer3 section group called MYSECTIONGROUP, the generated script is called SG_MYSECTIONGROUP.sql.
The MIGRATE_SYN procedure generates a load file for a named TextServer3 synonym ring.
CTXWMG_MIGRATE.MIGRATE_SYN(syn_sname IN VARCHAR2, loaddir IN VARCHAR2);
Specify the name of the synonym ring to be migrated.
Specify the directory for load file generated by MIGRATE_SYN.
No 'in-place' expansion occurs.
The name of the TextServer3 synonym ring is used as a qualifier for each term.
The name of the generated load file is based on the thesaurus name and the user ID for the TextServer3 user. For example, for a TextServer3 synonym ring called MYSYNRING belonging to a user with a TextServer3 user ID of 3, the generated file is called MYSYNRING_3.ths.
The MIGRATE_THES procedure generates a load file for a named TextServer3 thesaurus.
CTXWMG_MIGRATE.MIGRATE_THES(thesname IN VARCHAR2, t_expand_mode IN INTEGER DEFAULT 0, s_expand_mode IN INTEGER DEFAULT 0, dump_syns IN INTEGER DEFAULT 0, loaddir IN VARCHAR2);
Specify the name of the thesaurus to be migrated.
Specify the expansion mode for referenced thesauri:
0 No thesaurus expansion
1 Expand user referenced thesauri in-place
2 Expand user and public referenced thesauri in-place
s_expand_modeSpecify the expansion for referenced synonym rings:
0 No synonym ring expansion
1 Expand single user synonym ring in-place
2 Expand multiple user synonym rings in-place
3 Expand single user or public synonym ring in-place
4 Expand multiple user or public synonym rings in-place
dump_synsSpecify to dump all synonym rings owned by current user.
Specify the directory for load file generated by MIGRATE_THES.
The t_expand_mode and s_expand_mode parameters allow users to control what happens when the thesaurus hierarchy contains a narrower term that is a synonym ring.
If thesaurus expansion is used, a referenced thesaurus is expanded 'in-place'. This means that instead of the term being a reference to a thesaurus, it becomes the top-term of the referenced thesaurus. This effectively makes the referenced hierarchy part of the main hierarchy. If a thesaurus that has been made part of the main hierarchy references other thesauri itself, then those thesauri also become part of the main hierarchy. Referenced thesauri may be owned by the user or may be public. Expansion of referenced thesauri can be limited to those owned by the user if required.
In-place synonym ring expansion is similar to thesaurus expansion, but the terms in the referenced ring become narrower terms of the referencing term. The user can limit expansion to those synonym rings owned by the user. It is possible for a synonym ring term to be a reference to another synonym ring. Expansion of such a synonym ring would result in those terms also being narrower terms of the term in the thesaurus. The user can control this, as well as being able to limit the expansion to synonym rings owned by the user.
If a value of 1 is specified for the dump_syns parameter, MIGRATE_THES dumps all synonym rings into the same load file. When the load file is used to create a thesaurus in ConText, all the dumped TextServer3 synonym rings are part of that ConText thesaurus.
The name of the generated load file is based on the thesaurus name and the user ID for the TextServer3 user. For example, for a TextServer3 thesaurus called MYTHES belonging to a user with a TextServer3 user ID of 3, the generated file is called MYTHES_3.ths.
A sample script is supplied with the Migration Tool. It can be altered to suit the user's requirements. It prompts the user for values.
The following example makes use of the sample script to demonstrate how a user's text table and thesaurus can be migrated from TextServer3 to ConText.
In this example, the TextServer3 user is jbloggs on a TextServer3 database using a connect string 'prod_db'. The jbloggs user has a TextServer3 user ID of 3 and one text table called odc_papers:
The summary column and the documents in the paper column are in the same format, which is either Word for Windows 6 or WordPerfect 6 format. Both of these formats are supported internally by ConText. Neither of these two columns are external.
The application uses a single thesaurus, called MY_THESAURUS, but it does contain terms that are references to other thesauri and synonym rings. Some of these are owned by the user, others are public. The user requires the same terms to be returned from operations on the thesaurus using ConText.
The directory chosen for creating the column policy script and load files is /usr/home/joebloggs/migrate. The script creates the database link, and drops it after finishing the script generation.
The steps for migrating the text table and thesaurus are:
% sqlplus jbloggs/jbloggs @samp_mig
Enter the following information at the prompts:
jbloggs jbloggs prod_db /usr/home/joebloggs/migrate odc_papers joes_app 2 4 1
The Migration Tool creates three script files in the directory /usr/home/joebloggs/migrate:
If these scripts are then run from SQL*Plus, they create column policies for each of the text columns. Both column policies use the Autorec functionality to invoke the internal filters since the Migration Tool determines that all the formats used are supported by ConText. Before running the scripts, the user may choose to edit the preference name or any of the attribute values.
The thesaurus is migrated to ConText by using the ctxload utility. Assuming that the thesaurus should keep the name 'MY_THESAURUS', the following command would import the thesaurus:
% ctxload -user jbloggs/jbloggs -name MY_THESAURUS -file MY_THESAURUS_3.ths -thes
This completes the migration of the TextServer3 Dictionary information for the text table, ODC_PAPERS. The user should now import the data from the text table in the TextServer3 database and create a ConText index for the data.