Icon Optimizations

Introduction
DBISAM uses available indexes when optimizing SQL queries so that they execute in the least amount of time possible. In addition, joins are re-arranged to allow for the least number of joins as possible since joins tend to be fairly expensive in DBISAM.

Index Selection
DBISAM will use an available index to optimize any expression in the WHERE clause of an SQL SELECT, UPDATE, or DELETE statement. It will also use an available index to optimize any join expressions between multiple tables. This index selection is based on the following rules:

1) DBISAM only uses the first field of any given index for optimization. This means that if you have an index containing the fields LastName and FirstName, then DBISAM can only use the this index for optimizing any conditions that refer to the LastName field.

2) DBISAM can use both ascending and descending indexes for optimization.

3) DBISAM will only use case-sensitive indexes for optimizing any conditions on string fields unless the condition contains the UPPER() or LOWER() SQL function. In such a case DBISAM will only look for and use case-insensitive indexes for optimizing the condition. Conditions on non-string fields such as integer or boolean fields can always use any index that contains the same field, regardless of the index's case-insensitivity setting.

4) DBISAM can mix and match the optimization of conditions so that it is possible to have one condition be optimized and the other not. This is known as a partially-optimized query.

How DBISAM Builds the Query Results
Once an index is selected for optimizing a given condition of the WHERE clause, a range is set on the index in order to limit the index keys to those that match the current condition being optimized. The index keys that satisfy the condition are then scanned, and during the scan a bitmap is built in physical record number order. A bit is turned on if the physical record satisfies the condition, and a bit is turned off if it doesn't. This method of using bitmaps works well because it can represent sets of data with minimal memory consumption. Also, DBISAM is able to quickly determine how many records are in the set (how many bits are turned on), and it can easily AND, OR, and NOT bitmaps together to fulfill boolean logic between multiple conditions. Finally, because the bitmap is in physical record order, accessing the records using a bitmap is very direct since DBISAM uses fixed-length records with directly-addressable offsets in the physical table format.

When optimizing SQL SELECT queries that contain both join conditions and WHERE conditions, DBISAM always processes the non-join conditions first if the conditions do not affect the target table, which is the table on the right side of a LEFT OUTER JOIN or the table on the left side of a RIGHT OUTER JOIN. This can speed up join operations tremendously since the join conditions will only take into account the records existing in the source table(s) based upon the WHERE conditions. For example, consider the following query:

SELECT
OrderHdr.Cust_ID,
OrderHdr.Order_Num,
OrderDet.Model_Num,
OrderDet.Cust_Item
FROM OrderHdr, OrderDet
WHERE OrderHdr.Order_Num=OrderDet.Order_Num AND
      OrderHdr.Cust_ID='C901'
ORDER BY 1,2,3

In this example, the WHERE condition:

OrderHdr.Cust_ID='C901'

will be evaluated first before the join condition:

OrderHdr.Order_Num=OrderDet.Order_Num

so that the joins only need to process a small number of records in the OrderHdr table.

When optimizing SQL SELECT queries that contain INNER JOINs that also contain selection conditions (conditions in an INNER JOIN clause that do not specify an actual join), the selection conditions are always processed at the same time as the join, even if they affect the target table, which is the table on the right side of the join. This can speed up join operations tremendously since the join conditions will only take into account the records existing in the source table(s) based upon the selection conditions. For example, consider the following query:

SELECT
OrderHdr.Cust_ID,
OrderHdr.Order_Num,
OrderDet.Model_Num,
OrderDet.Cust_Item
FROM OrderHdr INNER JOIN OrderDet ON
OrderHdr.Order_Num=OrderDet.Order_Num AND OrderHdr.Cust_ID='C901'
ORDER BY 1,2,3

In this example, the selection condition:

OrderHdr.Cust_ID='C901'

will be evaluated first before the join condition:

OrderHdr.Order_Num=OrderDet.Order_Num

so that the joins only need to process a small number of records in the OrderHdr table.

Information If an SQL SELECT query can return a live result set, then the WHERE clause conditions are applied to the source table via an optimized filter and the table is opened. If an SQL SELECT query contains joins or other items that cause DBISAM to only return a canned result set, then all of the records from the source tables that satisfy the WHERE clause conditions and join conditions are copied to a temporary table on disk and that table is opened as the query result set. This process can be time-consuming when a large number of records are returned by the query, so it is recommended that you try to make your queries as selective as possible.

How Joins are Processed
Join conditions in SQL SELECT, UPDATE, or DELETE statements are processed in DBISAM using a technique known as nested-loop joins. This means that DBISAM recursively processes the source tables in a master-detail, master-detail, etc. arrangement with a driving table and a destination table (which then becomes the driving table for any subsequent join conditions). When using this technique, it is very important that the table with the smallest record count (after any non-join conditions from the WHERE clause have been applied) is specified as the first driving table in the processing of the joins. DBISAM's SQL optimizer will automatically optimize the join ordering so that the table with the smallest record count is placed as the first driving table as long as the joins are INNER JOINS or SQL-89 joins in the WHERE clause. LEFT OUTER JOINs and RIGHT OUTER JOINs cannot be re-ordered in such a fashion and must be left alone.

The following is an example that illustrates the nested-loop joins in DBISAM:

SELECT c.Company,
o.OrderNo,
e.LastName,
p.Description,
v.VendorName
FROM Customer c, Orders o, Items i, Vendors v, Parts p, Employee e
WHERE c.CustNo=o.CustNo AND
o.OrderNo=i.OrderNo AND
i.PartNo=p.PartNo AND 
p.VendorNo=v.VendorNo AND
o.EmpNo=e.EmpNo
ORDER BY e.LastName

In this example, DBISAM would process the joins in this order:

1) Customer table joined to Orders table on the CustNo column

2) Orders table joined to Items table on the OrderNo column and Orders table joined to Employee table on EmpNo column (this is also known as a multi-way, or star, join)

3) Items table joined to Parts table on the PartNo column

4) Parts table joined to Vendors table on the VendorNo column

In this case the Customer table is the smallest table in terms of record count, so making it the driving table in this case is a good choice. Also, you'll notice that in the case of the multi-way, or star, join between the Orders table and both the Items and Employee table, DBISAM will move the join order of the Employee table up in order to keep the join ordering as close to the order of the source tables in the FROM clause as possible.

Information You can use the NOJOINOPTIMIZE keyword at the end of the SQL SELECT, UPDATE, or DELETE statement in order to tell DBISAM not to reorder the joins. Also, SQL UPDATE and DELETE statements cannot have their driver table reordered due to the fact that the driver table is the table being updated by these statements.

Query Plans
You can use the TDBISAMQuery GeneratePlan property to indicate that you want DBISAM to generate a query plan for the current SQL statement or script when it is executed. The resulting query plan will be stored in the TDBISAMQuery Plan property. Examining this query plan can tell you exactly what the SQL optimizer is doing when executing a given SQL statement or script. For example, the query mentioned above would generate the following query plan:

================================================================================
SQL statement
================================================================================

SELECT c.Company,
o.OrderNo,
e.LastName,
p.Description,
v.VendorName
FROM Customer c, Orders o, Items i, Vendors v, Parts p, Employee e
WHERE c.CustNo=o.CustNo AND
o.OrderNo=i.OrderNo AND
i.PartNo=p.PartNo AND 
p.VendorNo=v.VendorNo AND
o.EmpNo=e.EmpNo
ORDER BY e.LastName

Result Set Generation
---------------------

Result set will be canned
Result set will consist of one or more rows
Result set will be ordered by the following column(s) using a case-sensitive 
temporary index:

LastName ASC

Join Ordering
-------------

The driver table is the Customer table (c)

The Customer table (c) is joined to the Orders table (o) with the INNER JOIN 
expression:

c.CustNo = o.CustNo

The Orders table (o) is joined to the Items table (i) with the INNER JOIN 
expression:

o.OrderNo = i.OrderNo

The Orders table (o) is joined to the Employee table (e) with the INNER JOIN 
expression:

o.EmpNo = e.EmpNo

The Items table (i) is joined to the Parts table (p) with the INNER JOIN 
expression:

i.PartNo = p.PartNo

The Parts table (p) is joined to the Vendors table (v) with the INNER JOIN 
expression:

p.VendorNo = v.VendorNo

Optimized Join Ordering
-----------------------

The driver table is the Vendors table (v)

The Vendors table (v) is joined to the Parts table (p) with the INNER JOIN 
expression:

v.VendorNo = p.VendorNo

The Parts table (p) is joined to the Items table (i) with the INNER JOIN 
expression:

p.PartNo = i.PartNo

The Items table (i) is joined to the Orders table (o) with the INNER JOIN 
expression:

i.OrderNo = o.OrderNo

The Orders table (o) is joined to the Customer table (c) with the INNER JOIN 
expression:

o.CustNo = c.CustNo

The Orders table (o) is joined to the Employee table (e) with the INNER JOIN 
expression:

o.EmpNo = e.EmpNo

Join Execution
--------------

Costs ARE NOT being taken into account when executing this join
Use the JOINOPTIMIZECOSTS clause at the end of the SQL statement to force the 
optimizer to consider costs when optimizing this join

The expression:

v.VendorNo = p.VendorNo

is OPTIMIZED

The expression:

p.PartNo = i.PartNo

is OPTIMIZED

The expression:

i.OrderNo = o.OrderNo

is OPTIMIZED

The expression:

o.CustNo = c.CustNo

is OPTIMIZED

The expression:

o.EmpNo = e.EmpNo

is OPTIMIZED

================================================================================

You'll notice that the joins have been re-ordered to be in the most optimal order. You'll also notice that the query plan mentions that the JOINOPTIMIZECOSTS clause is not being used. Use a JOINOPTIMIZECOSTS clause to force the query optimizer to use I/O cost projections to determine the most efficient way to process the conditions in a join expression. If you have a join expression with multiple conditions in it, then using this clause may help improve the performance of the join expression, especially if it is already executing very slowly.

Further Optimizations Provided by DBISAM
In addition to just using indexes to speed up the querying process, DBISAM also provides a few other optimizations that can greatly increase a given query's performance. When building a bitmap for a given optimized condition, DBISAM can take advantage of statistics that are kept in DBISAM indexes. These statistics accurately reflect the current make-up of the various values present in the index, and DBISAM uses this information to optimize the actual scan of the index.

DBISAM looks at the optimization of the query conditions, and when multiple conditions are joined by an AND operator, DBISAM ensures that the most optimized query condition is executed first. For example, consider a table of 25,000 records with the following structure:

Customer table

Field       Data Type      Index
---------------------------------------------------
ID          Integer        Primary Index
Name        String[30]
State       String[2]      Secondary, case-sensitive,
                           non-unique, ascending, index
TotalOrders BCD[2]

And consider the following SQL SELECT query:

SELECT *
FROM customer
WHERE (TotalOrders > 10000) and (State='CA')

As you can see, the TotalOrders condition cannot be optimized since no indexes exist that would allow for optimization, whereas the State condition can be optimized. If only 200 records in the table have a State field that contains 'CA', then processing the query in the order indicated by the expression would be very inefficient, since the following steps would take place:

1) All 25,000 physical records would be read and evaluated to build a bitmap for the (TotalOrders > 10000) condition.

2) The resultant bitmap from the previous step would be ANDed together with a bitmap built using the optimized index scan for the State condition.

DBISAM uses a much better approach because it knows that:

1) The TotalOrders condition is not optimized

2) The State condition is optimized

3) Both conditions are joined using the AND operator

it is able to reverse the query conditions in the WHERE clause and execute the index scan for the 200 records that satisfy the State condition first, and then proceed to only read the 200 records from disk in order to evaluate the TotalOrders condition. DBISAM has just saved a tremendous amount of I/O by simply reversing the query conditions.

Information This optimization only works with query conditions that are joined by the AND operator. If the above two conditions were joined using the OR operator, then DBISAM would simply read all 25,000 records and evaluate the entire WHERE expression for each record.

In the case of a completely un-optimized query, DBISAM's read-ahead buffering can help tremendously in reducing network traffic and providing the most efficient reads with the least amount of I/O calls to the operating system. DBISAM will read up to 32 kilobytes of contiguous records on disk in the course of processing an un-optimized query.

DBISAM can also optimize for the UPPER() and LOWER() SQL functions by using any case-insensitive indexes in the source tables to optimize the query condition. Take the following table for example:

Customer table

Field       Data Type      Index
---------------------------------------------------
ID          Integer        Primary Index
Name        String[30]
State       String[2]      Secondary, case-insensitive,
                           non-unique, ascending, index

And consider the following SQL SELECT query:

SELECT *
FROM customer
WHERE (UPPER(State)='CA')

In this query, DBISAM will be able to select and use the case-insensitive index on the State field, and this is caused by the presence of the UPPER() function around the field name. This can also be used to optimize joins. For example, here are two tables that use case-insensitive indexes for optimizing joins:

Customer table

Field       Data Type      Index
---------------------------------------------------
ID          String[10]     Primary, case-insensitive
                           index
Name        String[30]
State       String[2]

Orders table

Field       Data Type      Index
---------------------------------------------------
OrderNum    String[20]     Primary, case-insensitive
                           index
CustID      String[10]     Secondary, case-insensitive
                           index
TotalAmount BCD[2]

And consider the following SQL SELECT query:

SELECT *
FROM Customer, Orders
WHERE (UPPER(Customer.ID)=UPPER(Orders.CustID))

In this query, the join condition will be optimized due to the presence of the UPPER() function around the Orders.CustID field. The UPPER() function around the Customer.ID field is simply to ensure that the join is made on upper-case customer ID values only.

Optimization Levels
DBISAM determines the level of optimization for a WHERE or JOIN clause using the following rules:

Optimized Condition = Fully-Optimized WHERE or JOIN clause

Un-Optimized Condition = Un-Optimized WHERE or JOIN clause

Optimized Condition AND Optimized Condition = Fully-
Optimized WHERE or JOIN clause

Optimized Condition AND Un-Optimized Condition = Partially-
Optimized WHERE or JOIN clause

Un-Optimized Condition AND Optimized Condition = Partially-
Optimized WHERE or JOIN clause

Un-Optimized Condition AND Un-Optimized Condition = Un-
Optimized WHERE or JOIN clause

Optimized Condition OR Optimized Condition = Fully-
Optimized WHERE or JOIN clause

Optimized Condition OR Un-Optimized Condition = Un-
Optimized WHERE or JOIN clause

Un-Optimized Condition OR Optimized Condition = Un-
Optimized WHERE or JOIN clause

Un-Optimized Condition OR Un-Optimized Condition = Un-
Optimized WHERE or JOIN clause

Information The unary NOT operator causes any expression to become partially optimized. This is due to the fact that DBISAM must scan for, and remove, deleted records from the current records bitmap once it has taken the bitmap and performed the NOT operation on the bits.

DBISAM Limitations
DBISAM does not optimize multiple query conditions joined by an AND operator) by mapping them to a compound index that may be available. To illustrate this point, consider a table with the following structure:

Employee

Field       Data Type      Index
----------------------------------------------------------------------
LastName    String[30]     Primary Index  (both fields are part of the
FirstName   String[20]     Primary Index   primary index)

And consider the following query:

SELECT *
FROM Employee
WHERE (LastName='Smith') and (FirstName='John')

Logically you would assume that DBISAM can use the one primary index in order to optimize the entire WHERE clause. Unfortunately this is not the case, and instead DBISAM will only use the primary index for optimizing the LastName condition and resort to reading records in order to evaluate the FirstName condition.
Image