Oracle8 Tuning Release 8.0 A58246-01 |
|
Oracle supports transparent distributed queries to access data from multiple databases. It also provides many other distributed features, such as transparent distributed transactions and a transparent fully automatic two-phase commit. This chapter explains how the Oracle8 optimizer decomposes SQL statements, and how this affects performance of distributed queries. The chapter provides guidelines on how to influence the optimizer and avoid performance bottlenecks.
Topics include:
If a SQL statement references one or more remote tables, the optimizer first determines whether all remote tables are located at the same site. If all tables are located at the same remote site, Oracle sends the entire query to the remote site for execution. The remote site sends the resulting rows back to the local site. This is called a remote SQL statement. If the tables are located at more than one site, the optimizer decomposes the query into separate SQL statements to access each of the remote tables. This is called a distributed SQL statement. The site where the query is executed, called the "driving site," is normally the local site.
This section describes:
If a SQL statement references multiple tables, then the optimizer must determine which columns belong to which tables before it can decompose the SQL statement. For example, with
SELECT DNAME, ENAME FROM DEPT, EMP@REMOTE WHERE DEPT.DEPTNO = EMP.DEPTNO
the optimizer must first determine that the DNAME column belongs to the DEPT table and the ENAME column to the EMP table. Once the optimizer has the data dictionary information of all remote tables, it can build the decomposed SQL statements.
Column and table names in decomposed SQL statements appear between double quotes. You must enclose in double quotes any column and table names that contain special characters, reserved words, or spaces.
This mechanism also replaces an asterisk (*) in the select list with the actual column names. For example:
SELECT * FROM DEPT@REMOTE;
results in the decomposed SQL statement
SELECT A1."DEPTNO", A1."DNAME", A1."LOC" FROM "DEPT" A1;
Note: For the sake of simplicity, double quotes are not used in the remainder of this chapter.
If the entire SQL statement is sent to the remote database, the optimizer uses table aliases A1, A2, and so on, for all tables and columns in the query, in order to avoid possible naming conflicts. For example:
SELECT DNAME, ENAME FROM DEPT@REMOTE, EMP@REMOTE WHERE DEPT.DEPTNO = EMP.DEPTNO;
is sent to the remote database as
SELECT A2.DNAME, A1.ENAME FROM DEPT A2, EMP A1 WHERE A1.DEPTNO = A2.DEPTNO;
When a query accesses data on one or more databases, one site "drives" the execution of the query. This is known as the "driving site"; it is here that the data is joined, grouped and ordered. By default, the local Oracle server will be the driving site. A hint called DRIVING_SITE enables you to manually specify the driving site.
The decomposition of SQL statements is important because it determines the number of records or even tables that must be sent through the network. A knowledge of how the optimizer decomposes SQL statements can help you achieve optimum performance for distributed queries.
If a SQL statement references one or more remote tables, the optimizer must decompose the SQL statement into separate queries to be executed on the different databases. For example:
SELECT DNAME, ENAME FROM DEPT, EMP@REMOTE WHERE DEPT.DEPTNO = EMP.DEPTNO;
might be decomposed into
SELECT DEPTNO, DNAME FROM DEPT;
which is executed locally, and
SELECT DEPTNO, ENAME FROM EMP;
which is sent to the remote database. The data from both tables is joined locally. All this is done automatically and transparently for the user or application.
In some cases, however, it might be better to send the local table to the remote database and join the two tables on the remote database. This can be achieved either by creating a view, or by using the DRIVING_SITE hint. If you decide to create a view on the remote database, a database link from the remote database to the local database is also needed.
For example (on the remote database):
CREATE VIEW DEPT_EMP AS SELECT DNAME, ENAME FROM DEPT@LOCAL, EMP WHERE DEPT.DEPTNO = EMP.DEPTNO;
Then select from the remote view instead of the local and remote tables
SELECT * FROM DEPT_EMP@REMOTE;
Now the local DEPT table is sent through the network to the remote database, joined on the remote database with the EMP table, and the result is sent back to the local database.
See Also: "DRIVING_SITE" on page 8-27 for details about this hint.
Rule-based optimization does not have information about indexes for remote tables. It never, therefore, generates a nested loops join between a local table and a remote table with the local table as the outer table in the join. It uses either a nested loops join with the remote table as the outer table or a sort merge join, depending on the indexes available for the local table.
Cost-based optimization can consider more execution plans than rule-based optimization. Cost-based optimization knows whether indexes on remote tables are available, and in which cases it would make sense to use them. Cost-based optimization considers index access of the remote tables as well as full table scans, whereas rule-based optimization considers only full table scans.
The particular execution plan and table access that cost-based optimization chooses depends on the table and index statistics. For example, with
SELECT DNAME, ENAME FROM DEPT, EMP@REMOTE WHERE DEPT.DEPTNO = EMP.DEPTNO
the optimizer might choose the local DEPT table as the driving table and access the remote EMP table using an index. In that case the decomposed SQL statement becomes
SELECT ENAME FROM EMP WHERE DEPTNO = :1
This decomposed SQL statement is used for a nested loops operation.
If tables are on more than one remote site, it can be more effective to create a view than to use the DRIVING_SITE hint. If not all tables are on the same remote database, the optimizer accesses each remote table separately. For example:
SELECT D.DNAME, E1.ENAME, E2.JOB FROM DEPT D, EMP@REMOTE E1, EMP@REMOTE E2 WHERE D.DEPTNO = E1.DEPTNO AND E1.MGR = E2.EMPNO;
results in the decomposed SQL statements
SELECT EMPNO, ENAME FROM EMP;
and
SELECT ENAME, MGR, DEPTNO FROM EMP;
If you want to join the two EMP tables remotely, you can create a view to accomplish this. Create a view with the join of the remote tables on the remote database. For example (on the remote database):
CREATE VIEW EMPS AS SELECT E1.DEPTNO, E1.ENAME, E2.JOB FROM EMP E1, EMP E2 WHERE E1.MGR = E2.EMPNO;
and now select from the remote view instead of the remote tables:
SELECT D.DNAME, E.ENAME, E.JOB FROM DEPT D, EMPS@REMOTE E WHERE D.DEPTNO = E.DEPTNO;
This results in the decomposed SQL statement
SELECT DEPTNO, ENAME, JOB FROM EMPS;
In a distributed query, all hints are supported for local tables. For remote tables, however, you can use only join order and join operation hints. (Hints for access methods, parallel hints, and so on, have no effect.) For remote mapped queries, all hints are supported.
EXPLAIN PLAN gives information not only about the overall execution plan of SQL statements, but also about the way in which the optimizer decomposes SQL statements. EXPLAIN PLAN stores information in the PLAN_TABLE table. If remote tables are used in a SQL statement, the OPERATION column will contain the value REMOTE to indicate that a remote table is referenced, and the OTHER column will contain the decomposed SQL statement that will be sent to the remote database. For example:
EXPLAIN PLAN FOR SELECT DNAME FROM DEPT@REMOTE SELECT OPERATION, OTHER FROM PLAN_TABLE OPERATION OTHER --------- ------------------------------------- REMOTE SELECT A1."DNAME" FROM "DEPT" A1
Note the table alias and the double quotes around the column and table names.
See Also: Chapter 23, "The EXPLAIN PLAN Command"
You can utilize partition views to bring together tables that have the same structure, but contain different partitions of data. This can be very useful for a distributed database system, where each partition resides on a database and the data in each partition has common geographical properties.
When a query is executed on such a partition view, and the query contains a predicate that contains the result set to a subset of the view's partitions, the optimizer chooses a plan which skips partitions that are not needed for the query. This partition elimination takes place at run time, when the execution plan references all partitions.
This section describes the circumstances under which a UNION ALL view enables the optimizer to skip partitions. The Oracle server that contains the partition view must conform to the following rules:
Note: To use the cost-based optimizer you must analyze all tables used in the UNION ALL views. Alternatively, you can use a hint or set the parameter OPTIMIZER_MODE to ALL_ROWS or FIRST_ROW. To set OPTIMIZER_MODE or PARTITION_VIEW_ENABLED you can also use the ALTER SESSION statement.
Within a UNION ALL view there are multiple select statements, and each of these is called a "branch". A UNION ALL view is a partition view if each select statement it defines conforms to the following rules:
Partition elimination is based on column transitivity with constant predicates. The WHERE clause used in the query that accesses the partition view is pushed down to the WHERE clause of each of the branches in the UNION ALL view definition. Consider the following example:
SELECT * FROM EMP_VIEW WHERE deptno=30;
when the view EMP_VIEW is defined as
SELECT * FROM EMP@d10 WHERE deptno=10 UNION ALL SELECT * FROM EMP@d20 WHERE deptno=20 UNION ALL SELECT * FROM EMP@d30 WHERE deptno=30 UNION ALL SELECT * FROM EMP@d40 WHERE deptno=40
The "WHERE deptno=30" predicate used in the query is pushed down to the queries in the UNION ALL view. For a WHERE clause such as "WHERE deptno=10 and deptno=30", the optimizer applies transitivity rules to generate an extra predicate of "10=30". This extra predicate is always false, thus the table (EMP@d10) need not be accessed.
Transitivity applies to predicates which conform to the following rules:
relation AND relation ...
where relation is of the form
column_name relop constant_expression
and relop is one of =, !=, >, >=, <, <=
Note that BETWEEN ... AND is allowed by these rules, but IN is not.
To confirm that the system recognizes a partition view, check the EXPLAIN PLAN output. The following operations will appear in the OPERATIONS column of the EXPLAIN PLAN output, if a query was executed on a partition view:
If PARTITION does not appear in the option column of the UNION-ALL operation, the partition view was not recognized, and no partitions were eliminated. Make sure that the UNION ALL view adheres to the rules as defined in "Rules for Use" on page 9-8.
The following example shows a partition view CUSTOMER that is partitioned into two partitions. The EAST database contains the East Coast customers, and the WEST database contains the customers from the West Coast.
The WEST database contains the following table CUSTOMER_WEST:
CREATE TABLE CUSTOMER_WEST ( cust_no NUMBER CONSTRAINT CUSTOMER_WEST_PK PRIMARY KEY, cname VARCHAR2(10), location VARCHAR2(10) );
The EAST database contains the database CUSTOMER_EAST:
CREATE TABLE CUSTOMER_EAST ( cust_no NUMBER CONSTRAINT CUSTOMER_EAST_PK PRIMARY KEY, cname VARCHAR2(10), location VARCHAR2(10) );
The following partition view is created at the EAST database (you could create a similar view at the WEST database):
CREATE VIEW customer ASSELECT * FROM CUSTOMER_EAST WHERE location='EAST' UNION ALL SELECT * FROM CUSTOMER_WEST@WEST WHERE location='WEST';
If you execute the following statement, notice that the CUSTOMER_WEST table in the WEST database is not accessed:
EXPLAIN PLAN FOR SELECT * FROM CUSTOMER WHERE location='EAST';
As shown in the EXPLAIN PLAN output, the optimizer recognizes that the CUSTOMER_WEST partition need not be accessed:
SQL> r1 select lpad(' ',level*3-3)||operation operation,cost,options,OPERATION COST OPTIONS OBJECT_NOD OTHER ------------------------- ---- ---------- ---------- ------------------------- SELECT STATEMENT 1 VIEW 1 UNION-ALL PARTITION TABLE ACCESS 1 FULL FILTER REMOTE 1 WEST.WORLD SELECT "CUST_NO","CNAME", "LOCATION" FROM "CUSTOMER _WEST" "CUSTOMER_WEST" WH ERE "LOCATION"='EAST' AND "LOCATION"='WEST'object_node, other2 from plan_table 3 connect by parent_id = prior id 4* start with parent_id IS NULL
Distributed queries within the same version of Oracle have these restrictions:
The Transparent Gateways are used to access data from other data sources (relational databases, hierarchical databases, file systems, and so on). Transparent Gateways provide a means to transparently access data from a non-Oracle system, just as if it were another Oracle database.
When a SQL statement accesses data from non-Oracle systems, it is said to be a heterogeneous distributed SQL statement. To optimize heterogeneous distributed SQL statements, follow the same guidelines as for optimizing distributed SQL statements that access Oracle databases only. However, you must take into consideration that the non-Oracle system usually does not support all the functions and operators that Oracle8 supports. The Transparent Gateways therefore tell Oracle (at connect time) which functions and operators they do support. If the other data source does not support a function or operator, Oracle will perform that function or operator. In this case Oracle obtains the data from the other data source and applies the function or operator locally. This affects the way in which the SQL statements are decomposed and can affect performance, especially if Oracle is not on the same machine as the other data source.
You can use partition views with Oracle Transparent Gateways version 8 or higher. Make sure you adhere to the rules that are defined in "Rules for Use" on page 9-8. In particular:
You can improve performance of distributed queries in several ways:
In many cases there are several SQL statements which can achieve the same result. If all tables are on the same database, the difference in performance between these SQL statements might be minimal; but if the tables are located on different databases, the difference in performance might be bigger.
Cost-based optimization can use indexes on remote tables, considers more execution plans than rule-based optimization, and generally gives better results. With cost-based optimization performance of distributed queries is generally satisfactory. Only in rare occasions is it necessary to change SQL statements, create views, or use procedural code.
In some situations, views can be used to improve performance of distributed queries; for example:
In some rare occasions it can be more efficient to replace a distributed query by procedural code, such as a PL/SQL procedure or a precompiler program. Note that this option is mentioned here only for completeness, not because it is often needed.