Icon View Thread

The following is the text of the current message along with any replies.
Messages 1 to 10 of 10 total
Thread Strange JOIN phenomena
Sat, May 18 2013 5:34 AMPermanent Link

Roy Lambert

NLH Associates

Team Elevate Team Elevate

I messed up a simple query

SELECT
_Name,
_fkSandT_Staffing,
S1._Staff,
_fkSandT_Turnover,
S2._Turnover,
_ID
FROM
Companies
JOIN SandT S1 ON S1._Band = _fkSandT_Staffing
JOIN SandT S2 ON S2._Band = _fkSandT_Turnover

I left out the , after S1._Turnover so that _ID was treated as a correlation name. In both cases I get the same number of rows returned (6849) BUT the sequence is different.

Why should the columns included in the result set influence the sequence of the rows?


Roy Lambert
Sat, May 18 2013 6:04 AMPermanent Link

Fernando Dias

Team Elevate Team Elevate

Roy,

Can you post the query plans ?
They might contain the answer...

--
Fernando Dias
[Team Elevate]
Sat, May 18 2013 8:49 AMPermanent Link

Roy Lambert

NLH Associates

Team Elevate Team Elevate

Fernando

>Can you post the query plans ?
>They might contain the answer...

Trust me - they don't


SELECT ALL
"_Name" AS "_Name",
"_fkSandT_Staffing" AS "_fkSandT_Staffing",
"S1"."_Staff" AS "_Staff",
"_fkSandT_Turnover" AS "_fkSandT_Turnover",
"S2"."_Turnover" AS "_Turnover",
"_ID" AS "_ID"
FROM "Companies" INNER JOIN "SandT" AS "S1" ON "S1"."_Band" =
"_fkSandT_Staffing"
INNER JOIN "SandT" AS "S2" ON "S2"."_Band" = "_fkSandT_Turnover"


SELECT ALL
"_Name" AS "_Name",
"_fkSandT_Staffing" AS "_fkSandT_Staffing",
"S1"."_Staff" AS "_Staff",
"_fkSandT_Turnover" AS "_fkSandT_Turnover",
"S2"."_Turnover" AS "_ID"
FROM "Companies" INNER JOIN "SandT" AS "S1" ON "S1"."_Band" =
"_fkSandT_Staffing"
INNER JOIN "SandT" AS "S2" ON "S2"."_Band" = "_fkSandT_Turnover"

Everything else identical.

Roy Lambert
Sat, May 18 2013 10:52 AMPermanent Link

Fernando Dias

Team Elevate Team Elevate

Roy,

Then, I don't know...
I was guessing they could be using different indexes to scan the tables or even one of them using row scan, and as by definition, in SQL the order of the rows is irrelevant...

--
Fernando Dias
[Team Elevate]
Sat, May 18 2013 11:19 AMPermanent Link

Roy Lambert

NLH Associates

Team Elevate Team Elevate

Fernando


I can't see anything, but just in case you can spot something I've missed

================================================================================
SQL Query (Executed by ElevateDB 2.12 Build 1)

Note: The SQL  shown here is generated by ElevateDB and may not be exactly the
same as the SQL that was originally entered.  However, none of the differences
alter the execution results in any way.
================================================================================

SELECT ALL
"_Name" AS "_Name",
"_fkSandT_Staffing" AS "_fkSandT_Staffing",
"S1"."_Staff" AS "_Staff",
"_fkSandT_Turnover" AS "_fkSandT_Turnover",
"S2"."_Turnover" AS "_ID"
FROM "Companies" INNER JOIN "SandT" AS "S1" ON "S1"."_Band" =
"_fkSandT_Staffing"
INNER JOIN "SandT" AS "S2" ON "S2"."_Band" = "_fkSandT_Turnover"

Source Tables
-------------

Companies: 6849 rows
SandT (S1): 15 rows
SandT (S2): 15 rows

Result Set
----------

The result set was insensitive and read-only
The result set consisted of zero or more rows

Joins
-----

The driver table was the Companies table

The Companies table was joined to the SandT (S1) table with the inner join
expression:

"S1"."_Band" = "_fkSandT_Staffing"

The Companies table was joined to the SandT (S2) table with the inner join
expression:

"S2"."_Band" = "_fkSandT_Turnover"

The optimizer attempted to re-order the joins to a more optimal order

The optimizer successfully re-ordered the joins into this more optimal order:

The driver table was the SandT (S1) table

The SandT (S1) table was joined to the Companies table with the inner join
expression:

"_fkSandT_Staffing" = "S1"."_Band"

The Companies table was joined to the SandT (S2) table with the inner join
expression:

"S2"."_Band" = "_fkSandT_Turnover"

The following join condition was applied to the Companies table:

"_fkSandT_Staffing" = "S1"."_Band"

Row scan (Companies)

Hint: Create index(es) (Companies) on column(s) "_fkSandT_Staffing" for
possible better performance

The following join condition was applied to the SandT (S2) table:

"S2"."_Band" = "_fkSandT_Turnover"

Index scan (SandT.PK)


Result set I/O statistics
-------------------------

Total rows visited: 6849

Row buffer manager

Max buffer size: 1023.91KB Buffer size: 1.05MB

Hits: 6849   Misses: 0   Hit ratio: 1

Reads: 0   read: 0B

Writes: 0   written: 0B

Index Page buffer manager

Max buffer size: 2MB Buffer size: 204KB

Hits: 13475   Misses: 0   Hit ratio: 1

Reads: 0   read: 0B

Writes: 0   written: 0B

================================================================================
6849 row(s) returned in 2.465 secs
================================================================================


================================================================================
SQL Query (Executed by ElevateDB 2.12 Build 1)

Note: The SQL  shown here is generated by ElevateDB and may not be exactly the
same as the SQL that was originally entered.  However, none of the differences
alter the execution results in any way.
================================================================================

SELECT ALL
"_Name" AS "_Name",
"_fkSandT_Staffing" AS "_fkSandT_Staffing",
"S1"."_Staff" AS "_Staff",
"_fkSandT_Turnover" AS "_fkSandT_Turnover",
"S2"."_Turnover" AS "_Turnover",
"_ID" AS "_ID"
FROM "Companies" INNER JOIN "SandT" AS "S1" ON "S1"."_Band" =
"_fkSandT_Staffing"
INNER JOIN "SandT" AS "S2" ON "S2"."_Band" = "_fkSandT_Turnover"

Source Tables
-------------

Companies: 6849 rows
SandT (S1): 15 rows
SandT (S2): 15 rows

Result Set
----------

The result set was insensitive and read-only
The result set consisted of zero or more rows

Joins
-----

The driver table was the Companies table

The Companies table was joined to the SandT (S1) table with the inner join
expression:

"S1"."_Band" = "_fkSandT_Staffing"

The Companies table was joined to the SandT (S2) table with the inner join
expression:

"S2"."_Band" = "_fkSandT_Turnover"

The optimizer attempted to re-order the joins to a more optimal order

The optimizer successfully re-ordered the joins into this more optimal order:

The driver table was the SandT (S1) table

The SandT (S1) table was joined to the Companies table with the inner join
expression:

"_fkSandT_Staffing" = "S1"."_Band"

The Companies table was joined to the SandT (S2) table with the inner join
expression:

"S2"."_Band" = "_fkSandT_Turnover"

The following join condition was applied to the Companies table:

"_fkSandT_Staffing" = "S1"."_Band"

Row scan (Companies)

Hint: Create index(es) (Companies) on column(s) "_fkSandT_Staffing" for
possible better performance

The following join condition was applied to the SandT (S2) table:

"S2"."_Band" = "_fkSandT_Turnover"

Index scan (SandT.PK)


Result set I/O statistics
-------------------------

Total rows visited: 6849

Row buffer manager

Max buffer size: 1023.91KB Buffer size: 1.1MB

Hits: 6849   Misses: 0   Hit ratio: 1

Reads: 0   read: 0B

Writes: 0   written: 0B

Index Page buffer manager

Max buffer size: 2MB Buffer size: 156KB

Hits: 13471   Misses: 0   Hit ratio: 1

Reads: 0   read: 0B

Writes: 0   written: 0B

================================================================================
6849 row(s) returned in 2.527 secs
================================================================================


Roy Lambert
Sat, May 18 2013 11:30 AMPermanent Link

Fernando Dias

Team Elevate Team Elevate

Roy,

Nope, I can't spot any difference, apart from irrelevant things like buffer sizes...

--
Fernando Dias
[Team Elevate]
Sat, May 18 2013 12:22 PMPermanent Link

Raul

Team Elevate Team Elevate

Roy,

Any chance your _ID column is a (primary) index column.

My only guess would be that since you did not ask for any specific
ordering then including the _ID column resulted in EDB using the _ID
index vs just dong a natural read (no idea why exec plan does not show
it but since it might be used just to read the table recs it might not
present in plan due to not being needed for anything else).

All of this wild speculation of course

Raul

On 5/18/2013 5:34 AM, Roy Lambert wrote:
> I messed up a simple query
>
> SELECT
> _Name,
> _fkSandT_Staffing,
> S1._Staff,
> _fkSandT_Turnover,
> S2._Turnover,
> _ID
> FROM
> Companies
> JOIN SandT S1 ON S1._Band = _fkSandT_Staffing
> JOIN SandT S2 ON S2._Band = _fkSandT_Turnover
>
> I left out the , after S1._Turnover so that _ID was treated as a correlation name. In both cases I get the same number of rows returned (6849) BUT the sequence is different.
>
> Why should the columns included in the result set influence the sequence of the rows?
>
>
> Roy Lambert
>
Sun, May 19 2013 3:20 AMPermanent Link

Roy Lambert

NLH Associates

Team Elevate Team Elevate

Raul

>All of this wild speculation of course

Welcome to the club Smiley

Roy
Mon, May 20 2013 2:47 PMPermanent Link

Tim Young [Elevate Software]

Elevate Software, Inc.

Avatar

Email timyoung@elevatesoft.com

Roy,

<< I left out the , after S1._Turnover so that _ID was treated as a
correlation name. In both cases I get the same number of rows returned
(6849) BUT the sequence is different.

Why should the columns included in the result set influence the sequence of
the rows? >>

EDB tries to use the primary key of the driver table for ordering in order
to give a more "natural" order instead of the (possibly) random physical
order of the rows on disk.  This is only desirable if all of the columns in
the primary key of the driver table are being selected.  So, if in one case
you had all of the primary key columns present, then it would be ordered
differently than if you hadn't included them all.

Remember, unless you specify an ORDER BY, the row order is completely random
as far as you're concerned.  Whatever EDB does is simply to try and present
the rows in the most desirable way.

Tim Young
Elevate Software
www.elevatesoft.com
Tue, May 21 2013 6:08 AMPermanent Link

Roy Lambert

NLH Associates

Team Elevate Team Elevate

Tim

>EDB tries to use the primary key of the driver table for ordering in order
>to give a more "natural" order instead of the (possibly) random physical
>order of the rows on disk. This is only desirable if all of the columns in
>the primary key of the driver table are being selected. So, if in one case
>you had all of the primary key columns present, then it would be ordered
>differently than if you hadn't included them all.
>
>Remember, unless you specify an ORDER BY, the row order is completely random
>as far as you're concerned. Whatever EDB does is simply to try and present
>the rows in the most desirable way.

That explains it. Thanks.

Roy Lambert
Image