Icon View Thread

The following is the text of the current message along with any replies.
Messages 1 to 10 of 11 total
Thread Unicode Table joins using GUID are extremely slow
Sat, Aug 25 2012 9:17 PMPermanent Link

Barry

I have 2 tables (14k rows each) and any time I try and join 2 columns of the tables together by using the GUID columns File_Id (indexed), it will take a few hours!

select t1.file_id, t2.file_id, from table1 t1, table2 t2 where t1.file_id=t2.file_id

This should only take 5 seconds. Both tables have an index on file_id. I have also optimized the tables. It seems when GUID's are used in a table join, I might as well go home. Maybe the problem is Unicode related?

Has anyone else run into this problem with GUID in table joins on Unicode tables?

TIA
Barry
v2.09 with 3gb RAM on i5 machine
Sat, Aug 25 2012 11:29 PMPermanent Link

Barry

Here is the Execution Plan. What sticks out is the Total Rows Visited=215,752,032 which = 14689 x 14688 which are the # of rows in each table. So it appears it is NOT using the ix_FileId index at all (on the File_Id GUID column), and for each File_Id in Table1, it is searching for a corresponding File_Id in Table2 by traversing the entire table row by row (without using the ix_FileId index). No wonder it is slowww.

I have rebuilt the index in each table using:
CREATE INDEX "ix_FileId" ON "Table1" ("File_Id" COLLATE "UNI");
CREATE INDEX "ix_FileId" ON "Table2" ("File_Id" COLLATE "UNI");

Still it takes over an hour, or 14688x slower than it should be. How do I force it to use the index ix_FileId for the table join?

TIA
Barry

EDBMgr v2.09 B1

================================================================================
SQL Query (Executed by ElevateDB 2.09 Build 1)

Note: The SQL  shown here is generated by ElevateDB and may not be exactly the
same as the SQL that was originally entered.  However, none of the differences
alter the execution results in any way.
================================================================================

SELECT ALL
"t1"."file_id" AS "file_id",
"t2"."file_id" AS "file_id1"
FROM "Table1" AS "t1", "Table2" AS "t2"
WHERE "t1"."file_id" = "t2"."file_id"

Source Tables
-------------

Table1 (t1): 14689 rows
Table2 (t2): 14688 rows

Result Set
----------

The result set was insensitive and read-only
The result set consisted of zero or more rows

Filtering
---------

The following filter condition was applied to the result set rows as they were
generated:

"t1"."file_id" = "t2"."file_id"

Joins
-----

The driver table was the Table1 (t1) table

The optimizer attempted to re-order the joins to a more optimal order

The joins were already in the most optimal order

Result set I/O statistics
-------------------------

Total rows visited: 215752032

Row buffer manager

Max buffer size: 1048432 Buffer size: 2585088

Hits: 14688   Misses: 0   Hit ratio: 1

Reads: 0   Bytes read: 0

Writes: 0   Bytes written: 0

Index Page buffer manager

Max buffer size: 2097152 Buffer size: 442368

Hits: 28883   Misses: 0   Hit ratio: 1

Reads: 0   Bytes read: 0

Writes: 0   Bytes written: 0

================================================================================
14688 row(s) returned in 4409.821 secs
================================================================================
Sun, Aug 26 2012 12:03 AMPermanent Link

Raul

Team Elevate Team Elevate

Barry,

This is a full join so no wonder it's slow.

How about this SQL  :

SELECT
"t1"."file_id" AS "file_id",
"t2"."file_id" AS "file_id1"
FROM "Table1" AS "t1" INNER JOIN "Table2" AS "t2"
ON "t1"."file_id" = "t2"."file_id"

Raul


On 8/25/2012 11:29 PM, Barry wrote:
> Here is the Execution Plan. What sticks out is the Total Rows Visited=215,752,032 which = 14689 x 14688 which are the # of rows in each table. So it appears it is NOT using the ix_FileId index at all (on the File_Id GUID column), and for each File_Id in Table1, it is searching for a corresponding File_Id in Table2 by traversing the entire table row by row (without using the ix_FileId index). No wonder it is slowww.
>
> I have rebuilt the index in each table using:
> CREATE INDEX "ix_FileId" ON "Table1" ("File_Id" COLLATE "UNI");
> CREATE INDEX "ix_FileId" ON "Table2" ("File_Id" COLLATE "UNI");
>
> Still it takes over an hour, or 14688x slower than it should be. How do I force it to use the index ix_FileId for the table join?
>
> TIA
> Barry
>
> EDBMgr v2.09 B1
>
> ================================================================================
> SQL Query (Executed by ElevateDB 2.09 Build 1)
>
> Note: The SQL  shown here is generated by ElevateDB and may not be exactly the
> same as the SQL that was originally entered.  However, none of the differences
> alter the execution results in any way.
> ================================================================================
>
> SELECT ALL
> "t1"."file_id" AS "file_id",
> "t2"."file_id" AS "file_id1"
> FROM "Table1" AS "t1", "Table2" AS "t2"
> WHERE "t1"."file_id" = "t2"."file_id"
>
> Source Tables
> -------------
>
> Table1 (t1): 14689 rows
> Table2 (t2): 14688 rows
>
> Result Set
> ----------
>
> The result set was insensitive and read-only
> The result set consisted of zero or more rows
>
> Filtering
> ---------
>
> The following filter condition was applied to the result set rows as they were
> generated:
>
> "t1"."file_id" = "t2"."file_id"
>
> Joins
> -----
>
> The driver table was the Table1 (t1) table
>
> The optimizer attempted to re-order the joins to a more optimal order
>
> The joins were already in the most optimal order
>
> Result set I/O statistics
> -------------------------
>
> Total rows visited: 215752032
>
> Row buffer manager
>
> Max buffer size: 1048432 Buffer size: 2585088
>
> Hits: 14688   Misses: 0   Hit ratio: 1
>
> Reads: 0   Bytes read: 0
>
> Writes: 0   Bytes written: 0
>
> Index Page buffer manager
>
> Max buffer size: 2097152 Buffer size: 442368
>
> Hits: 28883   Misses: 0   Hit ratio: 1
>
> Reads: 0   Bytes read: 0
>
> Writes: 0   Bytes written: 0
>
> ================================================================================
> 14688 row(s) returned in 4409.821 secs
> ================================================================================
>
Sun, Aug 26 2012 12:22 AMPermanent Link

Barry

Raul wrote:

>Barry,

>This is a full join so no wonder it's slow.

>How about this SQL  :

>SELECT
>"t1"."file_id" AS "file_id",
>"t2"."file_id" AS "file_id1"
>FROM "Table1" AS "t1" INNER JOIN "Table2" AS "t2"
>ON "t1"."file_id" = "t2"."file_id"

Raul,
   I had tried that and was just going to post the results when I saw your reply. Using the inner join dropped it down to 17 seconds which is pretty good. Not as fast as the 5 seconds I was hoping for, but fast enough. The inner-join on the GUID column was almost 2x faster than an inner-join on an Integer column (32 seconds), which was surprising.

The reason I used:
  select t1.col1, t2.col2 from table1 t1, table2 t2 where t1.col1=t2.col2

is because that is how MySQL executes an equi-join. It is somewhat similar to an inner-join except MySQL can use either the t2.col2 index or t1.col1 index. From reading the EDB manual a few minutes ago, I see the EDB inner-join can also choose the optimal index for the inner-join which is great.

All I have to do now is remember the "new" inner join syntax and stop "equi-joins" like I've been doing for the past 10 years. Smile

Barry

BTW, MySQL doesn't have a full outer join so I think that's what threw me.
Sun, Aug 26 2012 1:29 PMPermanent Link

Raul

Team Elevate Team Elevate

Hi Barry,

The join syntax is pretty standard so i do use the "JOIN table ON ..."
syntax with the MySQL on the times i need to. This way all my sql always
remains fairly similar between different back ends.

I'm not sure but i think the MySQL optimizer likely resolves it back to
proper inner join join if you just use where clause like you did - it
might be worthwhile to email elevate support and see if Tim can
implement something like this in optimizer as well.

There is no full outer join in EDB either - only left or right.

Raul


On 8/26/2012 12:22 AM, Barry wrote:

> All I have to do now is remember the "new" inner join syntax and stop "equi-joins" like I've been doing for the past 10 years. Smile
>
> Barry
>
> BTW, MySQL doesn't have a full outer join so I think that's what threw me.
>
Tue, Aug 28 2012 1:20 AMPermanent Link

Barry

Raul wrote:

>>This is a full join so no wonder it's slow.

Why would a full join need to traverse 215,752,032 rows? Maybe if it were trying to do a cartesian product but my join only returned 14,688 rows when it finished. Is it a bug?

Barry
Tue, Aug 28 2012 4:08 AMPermanent Link

Fernando Dias

Team Elevate Team Elevate

Barry,

In fact, this syntax produces a CROSS JOIN, that is the Cartesian product of the 2 tables.

--
Fernando Dias
[Team Elevate]
Tue, Aug 28 2012 6:05 AMPermanent Link

John Hay

Barry

>     I had tried that and was just going to post the results when I saw your reply. Using the inner join dropped it
down to 17 seconds which is pretty good. Not as fast as the 5 seconds I was hoping for, but fast enough. The inner-join
on the GUID column was almost 2x faster than an inner-join on an Integer column (32 seconds), which was surprising.

Is this with tables with circa 14K rows?  If so this seems to be very slow indeed.  I tried it on an ancient (2003!!)
laptop with 17K records and the guid query took just over 2 seconds and the integer less than 1.5.

John

Tue, Aug 28 2012 10:24 AMPermanent Link

Raul

Team Elevate Team Elevate

What Fernando said - the join is cartesian but the where clause then
filters it down again to only matching rows.

So if there are 14,688 matching GUID's between tables it's not a bug but
just result of where filter.

Raul


On 8/28/2012 1:20 AM, Barry wrote:
> Raul wrote:
>
>>> This is a full join so no wonder it's slow.
>
> Why would a full join need to traverse 215,752,032 rows? Maybe if it were trying to do a cartesian product but my join only returned 14,688 rows when it finished. Is it a bug?
>
> Barry
>
Tue, Aug 28 2012 10:43 AMPermanent Link

Barry

Raul wrote:

>What Fernando said - the join is cartesian but the where clause then
>filters it down again to only matching rows.

>So if there are 14,688 matching GUID's between tables it's not a bug but
>just result of where filter.

Fernando & Raul,
   Ok, that makes sense. Thanks.

Barry
Page 1 of 2Next Page »
Jump to Page:  1 2
Image