Icon View Thread

The following is the text of the current message along with any replies.
Messages 21 to 30 of 43 total
Thread Display Duplicate Records in DBISAM
Wed, Jan 4 2006 4:17 PMPermanent Link

Tim Young [Elevate Software]

Elevate Software, Inc.

Avatar

Email timyoung@elevatesoft.com

Robert,

<< That was not the original problem. He had a query like

SELECT ,,, where field in (select f from table group by f having count(*) >
1)

and DBISAM complained that he was selecting two fields in the subquery (f
and count). >>

I understood the original problem.  I was commenting on, and confirming, the
limitation that he was running into.

<< Seems to me that just as you can use non-selected fields in the WHERE
clause, you should be able to use non-selected fields in the HAVING clause.
>>

There is only one case where you would want/need to use a non-SELECT columns
in the HAVING clause, and that is the COUNT(*) expression.

--
Tim Young
Elevate Software
www.elevatesoft.com

Wed, Jan 4 2006 4:35 PMPermanent Link

"Robert"

"Tim Young [Elevate Software]" <timyoung@elevatesoft.com> wrote in message
news:B905F730-C781-4D99-9F65-EDDFF7D8E89F@news.elevatesoft.com...
>
> There is only one case where you would want/need to use a non-SELECT
columns
> in the HAVING clause, and that is the COUNT(*) expression.
>

Not with SUM?

Robert

Wed, Jan 4 2006 5:57 PMPermanent Link

"Ole Willy Tuv"
Tim,

<< The query evaluation logic is only superficial - how the engine
implements the actual processing is irrelevant as long as the result is
correct. >>

The query evaluation logic is relevant for how the result set is built.
DBISAM is fully capable of producing an incorrect result by allowing
non-grouping columns in the select list of a grouped query, which may lead
to ambigious data in the result. Another major problem is that DBISAM is not
able to execute plain, valid queries like the one Scott presented.

Another example is:

select col1, (col1 + col2), sum(col3)
from t
group by col1, col2
--having col2 = 10



A
> case in point is evaluating WHERE clauses before joins.  Per the SQL
> statement, they should never be evaluated until after the JOINs, but there
> are times when it is perfectly safe and proper to do so.
>
> --
> Tim Young
> Elevate Software
> www.elevatesoft.com
>
>

Wed, Jan 4 2006 6:20 PMPermanent Link

"Ole Willy Tuv"
Sorry, I hit the button before the message was finished.

Another example is:

select col1, (col1 + col2), sum(col3)
from t
group by col1, col2
having col2 = 10

DBISAM chokes on the col2 reference both in the GROUP BY and the HAVING
clause, and the reason is that DBISAM wrongly processes these clauses
against the select list (result set) columns, instead of in the context of
the table expression where they belong. This is a major misunderstanding of
how the GROUP BY and HAVING clauses work.

The query is plain, valid SQL that any other database engine would execute
correctly - without errors. There is two references in the select list to
columns of the table expression which are not contained in an aggregate
function, and both are correctly specified as grouping columns. The HAVING
clause has one reference to a grouping column of the table expression, so
there is no problem there either.

The fact that this query doesn't work illustrates that the query evaluation
logic in DBISAM has serious flaws.

Ole

Thu, Jan 5 2006 9:09 AMPermanent Link

"Ole Willy Tuv"
Tim,

<< A case in point is evaluating WHERE clauses before joins.  Per the SQL
statement, they should never be evaluated until after the JOINs, but there
are times when it is perfectly safe and proper to do so. >>

Of course, good query optimizers will detect situations where queries can be
safely optimized by rearranging the query specification. But they don't
optimize queries by breaking processing rules or common SQL behavior.

I guess you meant a situation like this:

select t1.*
from t1
left join t2 on t1.pk = t2.fk
where t1.pk <= 10

The result set of this query is known to only contain rows with values equal
to or less than 10 in the t1.pk column. The query can therefore safely be
optimized by rephrasing the left table reference:

select t1.*
from (
 select *
 from t1
 where pk <= 10
) as t1
left join t2 on t1.pk = t2.fk

Ole

Thu, Jan 5 2006 9:38 AMPermanent Link

"Robert"

"Ole Willy Tuv" <owtuv@online.no> wrote in message
news:E429D22C-A7E8-4EB6-9D41-7E54E6C7976C@news.elevatesoft.com...
> Sorry, I hit the button before the message was finished.
>
> Another example is:
>
> select col1, (col1 + col2), sum(col3)
> from t
> group by col1, col2
> having col2 = 10
>
> DBISAM chokes on the col2 reference both in the GROUP BY and the HAVING

Shouldn't it be "where col2 = 10"? My understanding of "having" was that is
should be used for aggregated fields.

Robert


Thu, Jan 5 2006 10:10 AMPermanent Link

"Ole Willy Tuv"
Robert,

The HAVING clause can specify either grouping columns or aggregate
functions. Aggregate functions may reference any column belonging to the
table expression, since they're aggregating values within each group.

It may well be that a similar WHERE clause condition would have the same
effect in this particular query. I was explicitly using the HAVING clause to
illustrate the flaw in DBISAM.

Ole

Thu, Jan 5 2006 12:04 PMPermanent Link

"Robert"

"Ole Willy Tuv" <owtuv@online.no> wrote in message
news:58BF0FF3-68E5-44F0-B58E-AD63F51A54F7@news.elevatesoft.com...
>
> The HAVING clause can specify either grouping columns or aggregate
> functions. Aggregate functions may reference any column belonging to the
> table expression, since they're aggregating values within each group.
>

Thanks, I didn't know that. Seems that in the case where you filter grouping
columns, HAVING and WHERE are equivalent.

Robert

Thu, Jan 5 2006 4:03 PMPermanent Link

"Ole Willy Tuv"
Tim,

> select t1.*
> from t1
> left join t2 on t1.pk = t2.fk
> where t1.pk <= 10

While DBISAM does a good job of optimizing the above query, it does not
optimize a similar query with the filter condition defined on the right
table:

select t1.*
from t1
left join t2 on t1.pk = t2.fk
where t2.fk <= 10

This query executes in 8 seconds, while the original query executes in 15
milliseconds.

I might be wrong, but since this is an equi join, I'd think it's safe to
optimize the query by applying the filter condition on both tables:

select t1.*
from (
 select *
 from t1
 where pk <= 10
) as t1
left join (
 select *
 from t2
 where fk <= 10
) as t2 on t1.pk = t2.fk

Anyhow, other engines do optimize the second query equally efficient as with
the original query, so it's surely doable.

Ole

Thu, Jan 5 2006 5:06 PMPermanent Link

Tim Young [Elevate Software]

Elevate Software, Inc.

Avatar

Email timyoung@elevatesoft.com

Ole,

<< The fact that this query doesn't work illustrates that the query
evaluation logic in DBISAM has serious flaws. >>

You're overstating the case here, and frankly, I don't have the time right
now to spend 20 or so messages arguing the point.  The fact is, DBISAM
requires you to put any grouped, ordered, or having columns in the SELECT
list.  Love it or hate it, that's the case.

--
Tim Young
Elevate Software
www.elevatesoft.com

« Previous PagePage 3 of 5Next Page »
Jump to Page:  1 2 3 4 5
Image