Icon View Thread

The following is the text of the current message along with any replies.
Messages 11 to 15 of 15 total
Thread Filter optimization stumper
Thu, Feb 8 2007 1:46 PMPermanent Link

Tim Young [Elevate Software]

Elevate Software, Inc.

Avatar

Email timyoung@elevatesoft.com

Chris,

<< I have to agree with Sam, as it took me quite a bit of swallowing to get
my head around this UPPER(xxx) = UPPER(yyy) after Informix and MSSQL where I
used WHERE <this field> = <'this value'> on case-insensitive indexes or
case-sensitive indexes depending on the way the application needs to work,
and got the appropriate result set. >>

There's no way that they allowed the use of either type of index with the
same field comparison.  That would break existing SQL statements just by
modifying the index definition.

<< Indeed, this does work differently for apparently identical SQL, but only
because the index says it should. This is exactly how specifying UNIQUE
works. >>

UNIQUE is a constraint and a completely different thing altogether.

<< What Sam wants, ISTM, and what I initially expected (and keep on
forgetting I won't get) is an index that silently takes a peek at its own
definition and acts accordingly, without the programmer having to add these
unnecessary and obfuscating constructs, just as it does to decide
whether it can accept a new row or not on a unique index. >>

How does that help ?  The index can't read the developers mind, so it has no
way of knowing what type of comparison is desired. Wink

<< There is also the problem of collation for e.g. French accented
characters, where é=è=ê=e=E in a case-insensitive index, except in DBISAM
where they are only the same if included in UPPER(). If you use LOWER() they
should not be seen as the same - I haven't tried that one on a case-
insensitive index in DBISAM. >>

If that's the case then it is a bug and needs to be corrected by removing
the LOWER() function's ability to force a case-insensitive comparison.
However, most people don't use the LOWER() function and opt for the UPPER()
function anyways.

--
Tim Young
Elevate Software
www.elevatesoft.com

Thu, Feb 8 2007 5:22 PMPermanent Link

Sam Karl
"Tim Young [Elevate Software]" <timyoung@elevatesoft.com> wrote:

Sam,

<< I noticed a lot of Delphi databases you have to apply "Upper(Col)='ABC'"
on a case insensitive index, and I just don't get it. Smile
MySQL indexes are case insenstive by default and we can use "Where
col='ABC'" or "col='AbC'" and it returns the same rows. So why can't the
database engine be smart enough to realize the index is case insensitive,
and add the Upper() part for us behind the scene. Otherwise why would
someone want to perform a case sensitive search on a case insensitive index
which means going through the entire table? >>

<Indexes don't even exist in the SQL standard, so forget them when it comes
to how an SQL statement should work.  SQL statements should work exactly as
written, which is not something that MySQL and others are doing if they take
a case-sensitive search and make it case-insensitive simply because an index
exists on the column being compared.  In fact, I have a hard-time believing
that this is even the case.  Perhaps what you meant to say was that, by
default, all MySQL string column comparisons are case-insensitive ?>

Yes, that's what I meant.

<EDB, for example, will allow you to perform a case-insensitive search on a
column without the UPPER() provided that the collation assigned to the
column includes the case-insensitive attribute.  However, that is different
from an index definition because it is standard part of a table definition
and results in the same rows being returned regardless of whether an index
exists or not.  DBISAM doesn't have column-level collations, so it has to
rely on the UPPER() construct for you to tell it that the comparison is
case-insensitive.

This is the key principle - searches and comparisons should perform in the
same fashion and return the exact same rows irrespective of any defined
indexes.  The SQL standard dictates that all comparions are case-sensitive
unless the collation used dictates otherwise.>

I understand your point. You're adhering to the SQL standard for compatibility reasons.
But to me, after using case insensitive searches for years without having to resort to
Upper() functions, it looks like the SQL standard is just plain wrong. At least it is more
labor intensive for the programmer and he is more apt to make a mistake. Maybe I should
start a letter writing campaign? Write my congressman or hire a lobbyist? Smile

Sam
Fri, Feb 9 2007 3:48 PMPermanent Link

Tim Young [Elevate Software]

Elevate Software, Inc.

Avatar

Email timyoung@elevatesoft.com

Sam,

<< I understand your point. You're adhering to the SQL standard for
compatibility reasons. But to me, after using case insensitive searches for
years without having to resort to Upper() functions, it looks like the SQL
standard is just plain wrong. At least it is more labor intensive for the
programmer and he is more apt to make a mistake. Maybe I should start a
letter writing campaign? Write my congressman or hire a lobbyist? Smile>>

Actually the standard is to use a forced collation to effect a
case-insensitive comparison.  EDB had this in there along with the UPPER() /
LOWER() constructs to be compatible with DBISAM.  However, after Chris's
remarks regarding the LOWER() issue, I realized that it would be best if EDB
only used the forced collation route.  Basically, you would use something
like this to force a case-insensitive comparison in EDB:

SELECT *
FROM customer
WHERE Company COLLATE "ANSI_CI" LIKE 'Silicon%'

The SQL standard dictates that collations be exactly compatible on both
sides of a binary operation like the above or else an error is raised.
However, EDB relaxes this a bit and only uses the left side to determine
what collation should be used to affect the result of the binary operation.

--
Tim Young
Elevate Software
www.elevatesoft.com

Mon, Feb 12 2007 4:56 AMPermanent Link

Chris Erdal
"Tim Young [Elevate Software]" <timyoung@elevatesoft.com> wrote in
news:C811E1D8-FEB6-4C77-96A4-FA93D21B4E78@news.elevatesoft.com:

Tim,

 I sense a little agro in your response, and I think it's probably
justified:

> ... after Informix and MSSQL
> where I used WHERE <this field> = <'this value'> on case-insensitive
> indexes or case-sensitive indexes depending on the way the application
> needs to work, and got the appropriate result set. >>
>
> There's no way that they allowed the use of either type of index with
> the same field comparison.  That would break existing SQL statements
> just by modifying the index definition.

I had a look at one MSSQL 6.5 app I was thinking of, and in fact I had
only one index that was liable to be affected by case (because the others
were all numbers or uppercase codes) and that index was case-insensitive
because the WHOLE DATABASE was declared so by its collation when created,
so the field on which the index was based was case-insensitive.

So what remains of my point is simply that when collation specifies case-
insensitivity there is no need to add UPPER() around both sides for it to
work.

The corollary of that is of course that there's no way to select
different versions of a string for that field in index order, which meant
that the user complained he couldn't find mis-spelt distinct strings such
as "THis wOrd" to correct them because only "This Word" was visible when
ordering by this field.

So in fact your implementation is much more flexible, and from now on
I'll be thanking you for the added possibilities for the programmer,
instead of muttering "bloody UPPERs again" under my breath as I type!

>
><< Indeed, this does work differently for apparently identical SQL, but
>only
> because the index says it should. This is exactly how specifying
> UNIQUE works. >>
>
> UNIQUE is a constraint and a completely different thing altogether.

From the DB engineer's point of view, yes. From my point of view it's
just another special word that adds special characteristics to the way
things happen.

>
><< What Sam wants, ISTM, and what I initially expected (and keep on
> forgetting I won't get) is an index that silently takes a peek at its
> own
>  definition and acts accordingly, without the programmer having to add
>  these
> unnecessary and obfuscating constructs, just as it does to decide
>  whether it can accept a new row or not on a unique index. >>
>
> How does that help ?  The index can't read the developers mind, so it
> has no way of knowing what type of comparison is desired. Wink

what I meant was that one could specify what type of index one was
creating, and from then on it would be that type without our needing to
add UPPER() around each side for case-insensitivity. But that of course
removes the possibility of searching case-sensitively on a field if there
is a case-insensitive index on it.

>
><< There is also the problem of collation for e.g. French accented
> characters, where é=è=ê=e=E in a case-insensitive index, except in
> DBISAM where they are only the same if included in UPPER(). If you use
> LOWER() they should not be seen as the same - I haven't tried that one
> on a case- insensitive index in DBISAM. >>
>
> If that's the case then it is a bug and needs to be corrected by
> removing the LOWER() function's ability to force a case-insensitive
> comparison. However, most people don't use the LOWER() function and
> opt for the UPPER() function anyways.
>

I was asserting that off the top of my head - sorry - I haven't tested
it. And anyway, using UPPER all the time is simpler anyway, now I've
understood why it's required.

--
Chris
(XP-Pro + Delphi 7 Architect + DBISAM 4.25 build 3 + EDB 1.00 build 8)
Mon, Feb 12 2007 3:44 PMPermanent Link

Tim Young [Elevate Software]

Elevate Software, Inc.

Avatar

Email timyoung@elevatesoft.com

Chris,

<< I sense a little agro in your response, and I think it's probably
justified: >>

Not at all, I'm just trying to be clear.  My apologies if it seemed
otherwise.

<< I had a look at one MSSQL 6.5 app I was thinking of, and in fact I had
only one index that was liable to be affected by case (because the others
were all numbers or uppercase codes) and that index was case-insensitive
because the WHOLE DATABASE was declared so by its collation when created, so
the field on which the index was based was case-insensitive. >>

Yes, that would be acceptable in that case because the case-sensitivity is
not index-dependent.

<< So what remains of my point is simply that when collation specifies
case-insensitivity there is no need to add UPPER() around both sides for it
to work. >>

Correct.

<< The corollary of that is of course that there's no way to select
different versions of a string for that field in index order, which meant
that the user complained he couldn't find mis-spelt distinct strings such as
"THis wOrd" to correct them because only "This Word" was visible when
ordering by this field. >>

Good point.

<< So in fact your implementation is much more flexible, and from now on
I'll be thanking you for the added possibilities for the programmer,
instead of muttering "bloody UPPERs again" under my breath as I type! >>

Well, the UPPER/LOWER stuff is now gone in EDB in favor the standard COLLATE
clause, so it is even easier:

MyColumn COLLATE "ANSI_CI" = 'MyValue'

<< From the DB engineer's point of view, yes. From my point of view it's
just another special word that adds special characteristics to the way
things happen. >>

True. Smiley

<< I was asserting that off the top of my head - sorry - I haven't tested
it. And anyway, using UPPER all the time is simpler anyway, now I've
understood why it's required. >>

You are correct - using LOWER() won't work in some cases because of the
collation rules.  At any rate, that way is gone so it won't be an issue
anymore.

--
Tim Young
Elevate Software
www.elevatesoft.com

« Previous PagePage 2 of 2
Jump to Page:  1 2
Image