Icon View Thread

The following is the text of the current message along with any replies.
Messages 1 to 5 of 5 total
Thread Why doesn't this SubSelect work?
Sun, Sep 30 2012 9:18 PMPermanent Link

Barry

I have a problem with sub-selects not summing the proper number of rows.
Here are the two tables:

CREATE TABLE "Company"
(
"PL_Id" INTEGER GENERATED ALWAYS AS IDENTITY (START WITH 0, INCREMENT BY 1),
"Num_People" INTEGER DEFAULT 0,
"Start_Date" DATE
)


CREATE TABLE "Expenses"
(
"PL_Id" INTEGER NOT NULL,
"Expense_Code" VARCHAR(5) COLLATE "UNI_CI_AI_KI_WI" NOT NULL,
"Expense_Amount" FLOAT DEFAULT 0  NOT NULL
)

INSERT INTO "Company" VALUES (1, 100, DATE ''2011-01-01'');
INSERT INTO "Company" VALUES (2, 150, DATE ''2011-01-01'');
INSERT INTO "Company" VALUES (3, 355, DATE ''2011-01-01'');
INSERT INTO "Company" VALUES (4, 500, DATE ''2011-02-01'');
INSERT INTO "Company" VALUES (5, 422, DATE ''2011-02-01'');
INSERT INTO "Company" VALUES (6,  20, DATE ''2011-03-01'');
INSERT INTO "Company" VALUES (7, 333, DATE ''2011-02-01'');
INSERT INTO "Company" VALUES (8,  44, DATE ''2011-03-01'');
INSERT INTO "Expenses" VALUES (1, ''CC'',  10);
INSERT INTO "Expenses" VALUES (2,  ''CC'', 23);
INSERT INTO "Expenses" VALUES (3,  ''CC'', 100);
INSERT INTO "Expenses" VALUES (4,  ''FC'',  1000);
INSERT INTO "Expenses" VALUES (2,  ''FC'',  2200);
INSERT INTO "Expenses" VALUES (1,  ''FC'',  2000);
INSERT INTO "Expenses" VALUES (3,  ''FC'',  3333);
INSERT INTO "Expenses" VALUES (5,  ''CC'',  309);
INSERT INTO "Expenses" VALUES (6,  ''FC'', 5500);
INSERT INTO "Expenses" VALUES (7,  ''CC'', 44);
INSERT INTO "Expenses" VALUES (7,  ''CC'', 90);
INSERT INTO "Expenses" VALUES (8,  ''FC'', 10000);

select * from company;
PL_Id   Num_People   Start_Date
       1           100        2011-01-01
    2      150        2011-01-01
    3       355        2011-01-01
    4      500        2011-01-01
    5      422        2011-02-01
    6       20        2011-03-01
    7      333        2011-02-01
    8       44        2011-03-01
   
select * from expenses
PL_Id   Expense_Code   Expense_Amount
   1      CC             10
   2      CC             23
   3      CC            100
   4      FC           1000
   2      FC           2200
   1      FC           2000
   3      FC           3333
   5      CC            309
   6      FC           5500
   7      CC             44
   7      CC             90
   8      FC          10000
   
select start_date,
 sum(num_people) as Total_Num_People, count(*) as Num_Company_Rcds, (select count(*) from expenses where expenses.pl_id=company.pl_id) as Num_Expense_Rcds
 from company group by start_date
 
Start_Date   Total_Num_People   Num_Company_Rcds   Num_Expense_Rcds
2011-01-01       605            3               2
2011-02-01      1255            3               2
2011-03-01        64            2               1

There are 16 rows in the Expenses table, yet the Select statement counts only 5 Expense records. Why?
If I change it to "Group By PL_Id" then it finds all Expense rows. So it appears the subselect will only return the correct number of rows if the Subselect where clause uses the same column as the Group By statement.

Is this a bug? Or am I missing something?

TIA
Barry



================================================================================
SQL Query (Executed by ElevateDB 2.10 Build 1)

Note: The SQL  shown here is generated by ElevateDB and may not be exactly the
same as the SQL that was originally entered.  However, none of the differences
alter the execution results in any way.
================================================================================

SELECT ALL
"start_date" AS "start_date",
SUM("num_people") AS "Total_Num_People",
COUNT("*") AS "Num_Company_Rcds",
(SELECT ALL COUNT("*") AS "COUNT of ALL" FROM "expenses" WHERE
"expenses"."pl_id" = "company"."pl_id") AS "Num_Expense_Rcds"
FROM "company"
GROUP BY "start_date"

Source Tables
-------------

company: 8 rows

Result Set
----------

The result set was insensitive and read-only
The result set consisted of zero or more rows


Result set I/O statistics
-------------------------

Total rows visited: 8

Row buffer manager

Max buffer size: 1048560 Buffer size: 144

Hits: 18   Misses: 0   Hit ratio: 1

Reads: 0   Bytes read: 0

Writes: 0   Bytes written: 0

Index Page buffer manager

Max buffer size: 2097152 Buffer size: 8192

Hits: 10   Misses: 0   Hit ratio: 1

Reads: 0   Bytes read: 0

Writes: 0   Bytes written: 0

================================================================================
3 row(s) returned in 0 secs
================================================================================
Sun, Sep 30 2012 10:55 PMPermanent Link

Raul

Team Elevate Team Elevate

Barry,

On 9/30/2012 9:18 PM, Barry wrote:
> I have a problem with sub-selects not summing the proper number of rows.
> There are 16 rows in the Expenses table, yet the Select statement counts only 5 Expense records. Why?
> If I change it to "Group By PL_Id" then it finds all Expense rows. So it appears the subselect will only return the correct number of rows if the Subselect where clause uses the same column as the Group By statement.
> Is this a bug? Or am I missing something?

I'm no expert but AFAIK mixing the group by and subselect should honour
the group by what you're seeing - Tim would need advise on how subselect
should work in this case exactly.

Since you're using group by on the date you're effectively reducing the
company table to 3 returned rows so the subselect really only uses the 3
pl_id's and ignores others.

I ran "select * from company group by start_date" i get back pl_Id's
3,7, and 8 and those do happen to have 2,2 and 1 expense rows.

Raul


Sun, Sep 30 2012 11:12 PMPermanent Link

Fernando Dias

Team Elevate Team Elevate

Barry,

I can count only 12 and not 16 rows in expenses, but thats not the problem.
Anyway, your query is quite ambiguous ...
For example, what company.pl_id value do you expect EDB to use in the sub select when date='2011-01-01' ?
There are 3 different company.pl_id values for that date... the same idea applies for the other dates.

Let's suppose EDB is taking the last occurrence of company.pl_id when grouping:
   3 for '2011-01-01' --> the count in "expenses" would be 2
   7 for '2011-02-01' --> the count in "expenses" would be 2
   8 for '2011-03-01' --> the count in "expenses" would be 1

All seems to be correct, this is exactly what EDB is returning to you.
So, no it's not a bug, it's how EDB deals with this kind of ambiguous queries with GROUP BY.

--
Fernando Dias
[Team Elevate]
Mon, Oct 1 2012 12:21 AMPermanent Link

Barry

Raul,

>Since you're using group by on the date you're effectively reducing the
>company table to 3 returned rows so the subselect really only uses the 3
>pl_id's and ignores others.

>I ran "select * from company group by start_date" i get back pl_Id's
>3,7, and 8 and those do happen to have 2,2 and 1 expense rows.

Correct. It looks like the SubSelect is summarizing the Expenses rows for the 3 distinct company.pl_id that appear in the final result set, and NOT for all the rows in the Company table. I would have thought the Subselect should have gone through all the rows in the Company table.

So I replicated the tables in MySQL and sure enough, MySQL gives me the same results. I guess this is the way the subselects are suppose to works. Who would have thunk it?

To solve the problem I created a view on Expenses that joins Company.Start_Date and the subselect now uses something like "Select ..., (Select sum(Expense_Amount) from Expenses where Expenses.Start_Date = Company.Start_Date)" and that produces the correct results.

So the moral of this sad SubSelect story is, the Where clause of the Subselect has to match the Group By column of the outer Select statement if it is to summarize the inner rows properly.  I learned something new today. Smile

Barry
Mon, Oct 1 2012 12:29 AMPermanent Link

Barry

Fernando Dias wrote:

> Anyway, your query is quite ambiguous ...
> For example, what company.pl_id value do you expect EDB to use in the sub select when date='2011-01-01' ?

I had erroneously thought the Subselect would execute first and join all Company.PL_Id=Expenses.PL_Id then do the Group By on the result. But that is not the case. EDB is calculating the results correctly (at least it agrees with MySQL).

To solve the problem I created a view on the Expenses table so it has the Start_Date and now it can match this date with the Group By Start_Date and give me the correct totals.

Barry
Image