Icon View Thread

The following is the text of the current message along with any replies.
Messages 1 to 7 of 7 total
Thread Speed issues
Sun, Nov 19 2006 11:45 AMPermanent Link

Roy Lambert

NLH Associates

Team Elevate Team Elevate

I'm probably doing something wrong. I thought sql would be faster but navigationaly I can do the job far faster (sql c6s table c94ms). Anyone have any suggestion?

UPDATE Bayesian SET _Spam = _Spam + _Count, _NonSpam = _NonSpam - _Count,
_Total = _Spam + _NonSpam,
_Probability = COALESCE((_Spam / 453 / ((2 * _NonSpam )/ 3705 + _Spam / 453)),0)
FROM Bayesian JOIN "Memory\TokenCounts" on  "Memory\TokenCounts"._Token = _Token


while not bay2.Eof do begin <<< disk based version of "Memory\TokenCounts"
if Bayesian.FindKey([bay2.fieldbyname('_Token').asstring]) then begin
Bayesian.Edit;
Bayesian.FieldByName('_Spam').AsInteger := Bayesian.FieldByName('_Spam').AsInteger + Bay2.fieldbyname('_count').AsInteger;
Bayesian.FieldByName('_NonSpam').AsInteger := Bayesian.FieldByName('_NonSpam').AsInteger - Bay2.fieldbyname('_count').AsInteger;
Bayesian.FieldByName('_Total').AsInteger := Bayesian.FieldByName('_Spam').AsInteger + Bayesian.FieldByName('_NonSpam').AsInteger;
if Bayesian.FieldByName('_Total').AsInteger > 0 then begin
Bayesian.FieldByName('_probability').AsFloat :=
(Bayesian.FieldByName('_Spam').AsInteger / 453)
/
((2 * Bayesian.FieldByName('_NonSpam').AsInteger) / 3705 + Bayesian.FieldByName('_Spam').AsInteger / 453)
end else Bayesian.FieldByName('_probability').AsInteger := 0;
Bayesian.Post;
end;
Bay2.Next;
end;

Time taken c94 ms




================================================================================
SQL statement (Executed with 4.24 Build 1)
================================================================================

UPDATE Bayesian SET _Spam = _Spam + _Count, _NonSpam = _NonSpam - _Count,
_Total = _Spam + _NonSpam,
_Probability = COALESCE((_Spam / 453 / ((2 * _NonSpam )/ 3705 + _Spam / 453)),
0)
FROM Bayesian JOIN "Memory\TokenCounts" on  "Memory\TokenCounts"._Token =
_Token

Tables Involved
---------------

Bayesian (Bayesian) table opened shared, has 165352 rows
TokenCounts (TokenCounts) table opened shared, has 384 rows

Join Ordering
-------------

The driver table is the Bayesian table (Bayesian)

The Bayesian table (Bayesian) is joined to the TokenCounts table (TokenCounts)
with the INNER JOIN expression:

_Token = "Memory\TokenCounts"._Token

Optimizer will attempt to re-order the joins to a more optimal order
Use the NOJOINOPTIMIZE clause at the end of the SQL statement to force the
optimizer to leave the joins in their declared order

The joins are already in optimal order and cannot be optimized any further

The expression:

_Token = "Memory\TokenCounts"._Token

is OPTIMIZED

================================================================================
>>>>> 384 rows affected in 6.172 seconds
================================================================================






Roy Lambert
Mon, Nov 20 2006 5:02 PMPermanent Link

Tim Young [Elevate Software]

Elevate Software, Inc.

Avatar

Email timyoung@elevatesoft.com

Roy,

<< I'm probably doing something wrong. I thought sql would be faster but
navigationaly I can do the job far faster (sql c6s table c94ms). Anyone have
any suggestion?

UPDATE Bayesian SET _Spam = _Spam + _Count, _NonSpam = _NonSpam - _Count,
_Total = _Spam + _NonSpam,
_Probability = COALESCE((_Spam / 453 / ((2 * _NonSpam )/ 3705 + _Spam /
453)),0)
FROM Bayesian JOIN "Memory\TokenCounts" on "Memory\TokenCounts"._Token =
_Token >>

The SQL has to process the UPDATE by joining the Bayesian table to the
in-memory table, i.e. it has to scan all 165352 rows, whereas the
navigational code only has to navigate 384 rows.  This is a limitation of
UPDATEs with joins, i.e. the first driver table in the joins must be the
UPDATE table, and the optimizer cannot flip them around.

--
Tim Young
Elevate Software
www.elevatesoft.com

Tue, Nov 21 2006 4:30 AMPermanent Link

Roy Lambert

NLH Associates

Team Elevate Team Elevate

Tim


That's pretty much the conclusion I'd come to. Thank TIM for TDBISAMTables

Roy Lambert
Tue, Nov 21 2006 8:07 AMPermanent Link

"Robert"

"Tim Young [Elevate Software]" <timyoung@elevatesoft.com> wrote in message
news:891212AA-C06F-4324-BFEB-96805271D3F5@news.elevatesoft.com...
> Roy,
>
> << I'm probably doing something wrong. I thought sql would be faster but
> navigationaly I can do the job far faster (sql c6s table c94ms). Anyone
> have any suggestion?
>
> UPDATE Bayesian SET _Spam = _Spam + _Count, _NonSpam = _NonSpam - _Count,
> _Total = _Spam + _NonSpam,
> _Probability = COALESCE((_Spam / 453 / ((2 * _NonSpam )/ 3705 + _Spam /
> 453)),0)
> FROM Bayesian JOIN "Memory\TokenCounts" on "Memory\TokenCounts"._Token =
> _Token >>
>
> The SQL has to process the UPDATE by joining the Bayesian table to the
> in-memory table, i.e. it has to scan all 165352 rows, whereas the


Is this true even if both tables have the correct indexes?

Robert

Tue, Nov 21 2006 8:23 AMPermanent Link

Roy Lambert

NLH Associates

Team Elevate Team Elevate

Robert


They're both indexed on _Token

Roy Lambert
Tue, Nov 21 2006 10:03 AMPermanent Link

"Robert"

"Roy Lambert" <roy.lambert@skynet.co.uk> wrote in message
news:8132682A-7867-4365-AE8B-51E517B14F6A@news.elevatesoft.com...
> Robert
>
>
> They're both indexed on _Token
>

Shame, it would be nice if DBISAM optimized the JOIN.

Robert


Tue, Nov 21 2006 1:30 PMPermanent Link

Tim Young [Elevate Software]

Elevate Software, Inc.

Avatar

Email timyoung@elevatesoft.com

Robert,

<< Is this true even if both tables have the correct indexes? >>

Yes.  The join re-ordering is turned off for the driver table with respect
to the UPDATE and DELETE statements.

--
Tim Young
Elevate Software
www.elevatesoft.com

Image