Icon Filter Optimization

Introduction
DBISAM's filter optimizations rely on the use of available indexes and bitmaps in order to facilitate the quick retrieval and manipulation of sets of records that satisy all, or a portion of, a set of filter constraints.

Setting the Filter Expression
When an expression filter is set on a table in DBISAM using the TDBISAMTable or TDBISAMQuery Filter property, the following steps take place:

1) The filter expression is parsed and a set of token objects is created for each token in the expression.

2) The set of token objects is then examined for proper syntax and any errors in the filter expression are reported at this time.

3) The set of token objects is then examined again in order to determine the optimization level and make it available to the developer for examination via the TDBISAMTable or TDBISAMQuery FilterOptimizeLevel property. This process looks at the available indexes for each filter condition and uses this information to determine how optimized the filter expression is.

4) Once the filter is activated via the TDBISAMTable or TDBISAMQuery Filtered property, the optimization and filtering processes are performed.

How DBISAM Selects Indexes for Optimization
The first step in the optimization process is determining which indexes are available that can be used to speed up the filtering process. The rules for this index selection are as follows:

1) DBISAM only uses the first field of any given index for optimization. This means that if you have an index containing the fields LastName and FirstName, then DBISAM can only use this index for optimizing any filter conditions that refer to the LastName field.

2) DBISAM can use both ascending and descending indexes for optimization.

3) DBISAM will only use case-sensitive indexes for optimizing any filter conditions on string fields unless the foCaseInsensitive option is used with the TDBISAMTable or TDBISAMQuery FilterOptions property. You may also use the UPPER() or LOWER() functions on a column name to force DBISAM to use a case-insensitive index for optimizing the filter condition. Filter conditions on non-string fields such as integer or boolean fields can always use any index that contains the same field, regardless of the index's case-insensitivity setting.

4) DBISAM can mix and match the optimization of filter conditions so that it is possible to have one condition be optimized and the other not. This is known as a partially-optimized filter.

How DBISAM Builds the Filter Results
Once an index is selected for optimizing a given condition of the filter expression, a range is set on the index in order to limit the index keys to those that match the current filter condition being optimized. The index keys that satisfy the filter condition are then scanned, and during the scan a bitmap is built in physical record number order. A bit is turned on if the physical record satisfies the condition, and a bit is turned off if it doesn't. This method of using bitmaps works well because it can represent sets of data with minimal memory consumption. Also, DBISAM is able to quickly determine how many records are in the set (how many bits are turned on), and it can easily AND, OR, and NOT bitmaps together to fulfill boolean logic between multiple filter conditions. Finally, because the bitmap is in physical record order, accessing the records using a bitmap is very direct since DBISAM uses fixed-length records with directly-addressable offsets in the physical table format.

Further Optimizations Provided by DBISAM
In addition to just using indexes to speed up the filtering process, DBISAM also provides a few other optimizations that can greatly increase a given filter's performance. When building a bitmap for a given optimized condition, DBISAM can take advantage of statistics that are kept in DBISAM indexes. These statistics accurately reflect the current make-up of the various values present in the index.

DBISAM looks at the optimization of the filter conditions, and when multiple conditions are joined by an AND operator, DBISAM ensures that the most optimized filter condition is executed first. For example, consider a table of 25,000 records with the following structure:

Customer table

Field       Data Type      Index
---------------------------------------------------
ID          Integer        Primary Index
Name        String[30]
State       String[2]      Secondary, case-sensitive,
                           non-unique, ascending, index
TotalOrders BCD[2]

And consider the following filter:

(TotalOrders > 10000) and (State='CA')

As you can see, the TotalOrders condition cannot be optimized since no indexes exist that would allow for optimization, whereas the State condition can be optimized. If only 200 records in the table have a State field that contains 'CA', then processing the filter in the order indicated by the expression would be very inefficient, since the following steps would take place:

1) All 25,000 physical records would be read and evaluated to build a bitmap for the (TotalOrders > 10000) condition.

2) The resultant bitmap from the previous step would be ANDed together with a bitmap built using the optimized index scan for the State condition.

DBISAM uses a much better approach because it knows that:

1) The TotalOrders condition is not optimized

2) The State condition is optimized

3) Both conditions are joined using the AND operator

it is able to reverse the filter conditions in the WHERE clause and execute the index scan for the 200 records that satisfy the State condition first, and then proceed to only read the 200 records from disk in order to evaluate the TotalOrders condition. DBISAM has just saved a tremendous amount of I/O by simply reversing the filter conditions.

Information This optimization only works with filter conditions that are joined by the AND operator. If the above two conditions were joined using the OR operator, then DBISAM would simply read all 25,000 records and evaluate the entire filter expression for each record.

In the case of a completely un-optimized filter, DBISAM's read-ahead buffering can help tremendously in reducing network traffic and providing the most efficient reads with the least amount of I/O calls to the operating system. DBISAM will read up to 32 kilobytes of contiguous records on disk in the course of processing an un-optimized filter.

DBISAM can also optimize for the UPPER() and LOWER() functions by using any case-insensitive indexes in the table to optimize the filter condition. Take the following table for example:

Customer table

Field       Data Type      Index
---------------------------------------------------
ID          Integer        Primary Index
Name        String[30]
State       String[2]      Secondary, case-insensitive,
                           non-unique, ascending, index

And consider the following filter:

(UPPER(State)='CA')

In this filter, DBISAM will be able to select and use the case-insensitive index on the State field, and this is caused by the presence of the UPPER() function around the field name.

Optimization Levels
DBISAM determines the level of optimization for a filter using the following rules:

Optimized Condition = Fully-Optimized filter

Un-Optimized Condition = Un-Optimized filter

Optimized Condition AND Optimized Condition = Fully-
Optimized filter

Optimized Condition AND Un-Optimized Condition = Partially-
Optimized filter

Un-Optimized Condition AND Optimized Condition = Partially-
Optimized filter

Un-Optimized Condition AND Un-Optimized Condition = Un-
Optimized filter

Optimized Condition OR Optimized Condition = Fully-
Optimized filter

Optimized Condition OR Un-Optimized Condition = Un-
Optimized filter

Un-Optimized Condition OR Optimized Condition = Un-
Optimized filter

Un-Optimized Condition OR Un-Optimized Condition = Un-
Optimized filter

Information The unary NOT operator causes any expression to become partially optimized. This is due to the fact that DBISAM must scan for, and remove, deleted records from the current records bitmap once it has taken the bitmap and performed the NOT operation on the bits.

DBISAM Limitations
DBISAM does not optimize multiple filter conditions joined by an AND operator) by mapping them to a compound index that may be available. To illustrate this point, consider a table with the following structure:

Employee

Field       Data Type      Index
----------------------------------------------------------------------
LastName    String[30]     Primary Index  (both fields are part of the
FirstName   String[20]     Primary Index   primary index)

And consider the following filter:

(LastName='Smith') and (FirstName='John')

Logically you would assume that DBISAM can use the one primary index in order to optimize the entire filter. Unfortunately this is not the case, and instead DBISAM will only use the primary index for optimizing the LastName condition and resort to reading records in order to evaluate the FirstName condition.
Image