Icon Collations and Comparisons

Published on Mon, May 26 2008
ImageIn ElevateDB, a collation can be assigned to any VARCHAR, CHAR, or CLOB column, indexed column, variable, or parameter. Collations are the entity used in databases to dictate how VARCHAR, CHAR, and CLOB values are compared. Thus, they subsequently serve to dictate the sort order of columns and how they are searched. For a primer on collations in ElevateDB, please see the Internationalization topic in the ElevateDB SQL manual.

Using Collations to Affect Comparisons
With the SQL comparison operators such as =, <>, IN, and BETWEEN, any VARCHAR, CHAR, or CLOB column, variable, or parameter can be compared against another similarly-typed value. In such comparisons, the collation of the colum, variable, or parameter on the left-hand side of the comparison operator dictates which collation will be used for the comparion.

For example, the following table is the Albums table from the CD Collector sample application that ships with the ElevateDB VCL products:

CREATE TABLE "Albums"
(
"No" INTEGER GENERATED BY DEFAULT AS IDENTITY (START WITH 0, INCREMENT BY 1),
"Name" VARCHAR(50) COLLATE "ANSI_CI",
"Artist" VARCHAR(30) COLLATE "ANSI_CI" NOT NULL,
"Genre" VARCHAR(20) COLLATE "ANSI_CI" NOT NULL,
"Year" INTEGER NOT NULL,
"Label" VARCHAR(30) COLLATE "ANSI_CI" NOT NULL,
"Imported" BOOLEAN DEFAULT FALSE   NOT NULL,
"NumDiscs" INTEGER DEFAULT 1   NOT NULL,
"CoverArt" BLOB COMPRESSION 6,
"Comments" CLOB COLLATE "ANSI_CI",
"PurchasedOn" DATE,
"PurchasePrice" DECIMAL(20,2),
CONSTRAINT "No" PRIMARY KEY ("No"),
CONSTRAINT "Artist" FOREIGN KEY ("Artist") REFERENCES "Artists" ("Name"),
CONSTRAINT "Genre" FOREIGN KEY ("Genre") REFERENCES "Genres" ("Name"),
CONSTRAINT "Label" FOREIGN KEY ("Label") REFERENCES "Labels" ("Name"),
CONSTRAINT "UniqueKey_Name" UNIQUE ("Name")
)
DESCRIPTION 'Contains CD albums in the collection'

As you can see, the Genre column is defined with the ANSI_CI collation, which means the base ANSI collation with case-insensitivity.

The following SQL SELECT statement selects all albums where the Genre column is equal to 'ROCK':

SELECT *
FROM albums
WHERE Genre = 'ROCK'

And the result would be this:

Image

Because the column is defined with a case-insensitive collation, the comparison returns all of the rows where the Genre column contains 'Rock', even though the value being compared was 'ROCK'.

But, let's suppose that we want to force a case-sensitive comparison regardless of the collation that was specified for the column. We would do that using the COLLATE clause like this:

SELECT *
FROM albums
WHERE Genre COLLATE ANSI = 'ROCK'

The result would be an empty result set, like this:

Image

If we fix the SELECT statement so that the value being compared is the correct case:

SELECT *
FROM albums
WHERE Genre COLLATE ANSI = 'Rock'

then the result will be the same as the first case-insensitive comparison.

Now, comparisons like this are pretty common for SELECT statements and other DML statements like INSERT, UPDATE, and DELETE statements. But what about variable and parameter comparisons in scripts, stored procedures, functions, triggers, and jobs ? They too, work just like a column comparison. To illustrate, let's again go back to the CD Collector sample application, and the stored procedure called Summaries:

CREATE PROCEDURE Summaries(IN "SummaryType" CHAR(1) COLLATE ANSI_CI)
BEGIN
DECLARE Result CURSOR WITH RETURN FOR Stmt;

CASE SummaryType
-- Genres summary
WHEN 'G' THEN
   BEGIN
   PREPARE Stmt FROM 'SELECT Genre AS Name, COUNT(Name) AS NumAlbums,
                      SUM(PurchasePrice) AS TotalPurchases
                      FROM Albums
                      GROUP BY Genre';
   OPEN Result;
   END;
-- Labels summary
WHEN 'L' THEN
   BEGIN
   PREPARE Stmt FROM 'SELECT Label AS Name, COUNT(Name) AS NumAlbums,
                      SUM(PurchasePrice) AS TotalPurchases
                      FROM Albums
                      GROUP BY Label';
   OPEN Result;
   END;
-- Artists summary
WHEN 'A' THEN
   BEGIN
   PREPARE Stmt FROM 'SELECT Artist AS Name, COUNT(Name) AS NumAlbums,
                      SUM(PurchasePrice) AS TotalPurchases
                      FROM Albums
                      GROUP BY Artist';
   OPEN Result;
   END;
END CASE;
END

As you can see, the SummaryType parameter is defined with the collation ANSI_CI which, again, means the ANSI collation with case-insensitivity. By defining the collation at the parameter level, it means that every comparison where the parameter is located on the left-hand side of the comparison operator will result in the comparison being performed using the ANSI collation and case-insensitivity. The Summaries procedure uses this to effect a case-insensitive comparison in the CASE statement so that the proper cursor is returned irregardless of what case is used for the value passed in for the SummaryType parameter.

You can also force a specific collation at the point where the comparison is made, as illustrated by this version of the Summaries procedure:

CREATE PROCEDURE Summaries(IN "SummaryType" CHAR(1))
BEGIN
DECLARE Result CURSOR WITH RETURN FOR Stmt;

CASE SummaryType COLLATE ANSI_CI
-- Genres summary
WHEN 'G' THEN
   BEGIN
   PREPARE Stmt FROM 'SELECT Genre AS Name, COUNT(Name) AS NumAlbums,
                      SUM(PurchasePrice) AS TotalPurchases
                      FROM Albums
                      GROUP BY Genre';
   OPEN Result;
   END;
-- Labels summary
WHEN 'L' THEN
   BEGIN
   PREPARE Stmt FROM 'SELECT Label AS Name, COUNT(Name) AS NumAlbums,
                      SUM(PurchasePrice) AS TotalPurchases
                      FROM Albums
                      GROUP BY Label';
   OPEN Result;
   END;
-- Artists summary
WHEN 'A' THEN
   BEGIN
   PREPARE Stmt FROM 'SELECT Artist AS Name, COUNT(Name) AS NumAlbums,
                      SUM(PurchasePrice) AS TotalPurchases
                      FROM Albums
                      GROUP BY Artist';
   OPEN Result;
   END;
END CASE;
END

Using Collations to Affect Ordering and Grouping
In a SELECT statement, the GROUP BY and ORDER BY clauses use collations to determine how the result set rows are grouped and sorted. By default, the collations used are determined by the columns or expressions that are specified in the GROUP BY and/or ORDER BY clause. This is one of the interesting features in ElevateDB - the ability to specify different collations for different columns specified in the GROUP BY and/or ORDER BY clause.

Let's use the SELECT statement in the above Summaries procedure as an example:

SELECT Genre AS Name,
COUNT(Name) AS NumAlbums,
SUM(PurchasePrice) AS TotalPurchases
FROM Albums
GROUP BY Genre

If you remember, the Genre column is defined with the collation of ANSI_CI, which means that this SELECT statement will group any rows with 'Rock', 'ROCK', or 'rock' together, considering them all equal for the purposes of grouping. Furthermore, because ElevateDB defaults to sorting the result set according to the GROUP BY clause if no ORDER BY clause also exists, the result set will sorted by the Genre column using the ANSI_CI collation, as shown here:

Image

And, as mentioned above with comparisons, if we wanted to effect a case-sensitive grouping, we would use the COLLATE clause to override the collation defined for the column:

SELECT Genre AS Name,
COUNT(Name) AS NumAlbums,
SUM(PurchasePrice) AS TotalPurchases
FROM Albums
GROUP BY Genre COLLATE ANSI

Using Collations to Affect the MIN and MAX Aggregate Functions
One final aspect of collations that needs to be discussed is the MIN and MAX aggregate functions. When used with VARCHAR or CHAR columns, the MIN and MAX aggregate functions use the collation defined for the column to determine the lowest or highest value. For example, the following SELECT statement will use the case-insensitive ANSI collation defined for the Genre column to determine the lowest value:

SELECT MIN(Genre) AS Name
FROM Albums

To effect a case-sensitive MIN calculation, we again use the COLLATE clause to do so:

SELECT MIN(Genre COLLATE ANSI) AS Name
FROM Albums

Hopefully we've been able to give you a little better understanding of how collations work in ElevateDB, and possibly some cool things you can do with them that you weren't aware of. As always, if you have any questions at all, please feel free to post them in the ElevateDB newsgroup here:

ElevateDB General Support Forum

This will allow everyone to share in any new ideas or comments that you may have regarding this article.

The next technical article will be entitled "Building a Sales Quote Replication System in ElevateDB", so look for it soon.

Source Code Source Code for this Article
Image