Support Forums - View Thread

View Thread

The following is the text of the current message along with any replies.

Messages 1 to 5 of 5 total

Sun, Sep 30 2012 9:18 PM	Permanent Link
Barry	I have a problem with sub-selects not summing the proper number of rows. Here are the two tables: CREATE TABLE "Company" ( "PL_Id" INTEGER GENERATED ALWAYS AS IDENTITY (START WITH 0, INCREMENT BY 1), "Num_People" INTEGER DEFAULT 0, "Start_Date" DATE ) CREATE TABLE "Expenses" ( "PL_Id" INTEGER NOT NULL, "Expense_Code" VARCHAR(5) COLLATE "UNI_CI_AI_KI_WI" NOT NULL, "Expense_Amount" FLOAT DEFAULT 0 NOT NULL ) INSERT INTO "Company" VALUES (1, 100, DATE ''2011-01-01''); INSERT INTO "Company" VALUES (2, 150, DATE ''2011-01-01''); INSERT INTO "Company" VALUES (3, 355, DATE ''2011-01-01''); INSERT INTO "Company" VALUES (4, 500, DATE ''2011-02-01''); INSERT INTO "Company" VALUES (5, 422, DATE ''2011-02-01''); INSERT INTO "Company" VALUES (6, 20, DATE ''2011-03-01''); INSERT INTO "Company" VALUES (7, 333, DATE ''2011-02-01''); INSERT INTO "Company" VALUES (8, 44, DATE ''2011-03-01''); INSERT INTO "Expenses" VALUES (1, ''CC'', 10); INSERT INTO "Expenses" VALUES (2, ''CC'', 23); INSERT INTO "Expenses" VALUES (3, ''CC'', 100); INSERT INTO "Expenses" VALUES (4, ''FC'', 1000); INSERT INTO "Expenses" VALUES (2, ''FC'', 2200); INSERT INTO "Expenses" VALUES (1, ''FC'', 2000); INSERT INTO "Expenses" VALUES (3, ''FC'', 3333); INSERT INTO "Expenses" VALUES (5, ''CC'', 309); INSERT INTO "Expenses" VALUES (6, ''FC'', 5500); INSERT INTO "Expenses" VALUES (7, ''CC'', 44); INSERT INTO "Expenses" VALUES (7, ''CC'', 90); INSERT INTO "Expenses" VALUES (8, ''FC'', 10000); select * from company; PL_Id Num_People Start_Date 1 100 2011-01-01 2 150 2011-01-01 3 355 2011-01-01 4 500 2011-01-01 5 422 2011-02-01 6 20 2011-03-01 7 333 2011-02-01 8 44 2011-03-01 select * from expenses PL_Id Expense_Code Expense_Amount 1 CC 10 2 CC 23 3 CC 100 4 FC 1000 2 FC 2200 1 FC 2000 3 FC 3333 5 CC 309 6 FC 5500 7 CC 44 7 CC 90 8 FC 10000 select start_date, sum(num_people) as Total_Num_People, count() as Num_Company_Rcds, (select count() from expenses where expenses.pl_id=company.pl_id) as Num_Expense_Rcds from company group by start_date Start_Date Total_Num_People Num_Company_Rcds Num_Expense_Rcds 2011-01-01 605 3 2 2011-02-01 1255 3 2 2011-03-01 64 2 1 There are 16 rows in the Expenses table, yet the Select statement counts only 5 Expense records. Why? If I change it to "Group By PL_Id" then it finds all Expense rows. So it appears the subselect will only return the correct number of rows if the Subselect where clause uses the same column as the Group By statement. Is this a bug? Or am I missing something? TIA Barry ================================================================================ SQL Query (Executed by ElevateDB 2.10 Build 1) Note: The SQL shown here is generated by ElevateDB and may not be exactly the same as the SQL that was originally entered. However, none of the differences alter the execution results in any way. ================================================================================ SELECT ALL "start_date" AS "start_date", SUM("num_people") AS "Total_Num_People", COUNT("") AS "Num_Company_Rcds", (SELECT ALL COUNT("") AS "COUNT of ALL" FROM "expenses" WHERE "expenses"."pl_id" = "company"."pl_id") AS "Num_Expense_Rcds" FROM "company" GROUP BY "start_date" Source Tables ------------- company: 8 rows Result Set ---------- The result set was insensitive and read-only The result set consisted of zero or more rows Result set I/O statistics ------------------------- Total rows visited: 8 Row buffer manager Max buffer size: 1048560 Buffer size: 144 Hits: 18 Misses: 0 Hit ratio: 1 Reads: 0 Bytes read: 0 Writes: 0 Bytes written: 0 Index Page buffer manager Max buffer size: 2097152 Buffer size: 8192 Hits: 10 Misses: 0 Hit ratio: 1 Reads: 0 Bytes read: 0 Writes: 0 Bytes written: 0 ================================================================================ 3 row(s) returned in 0 secs ================================================================================
Sun, Sep 30 2012 10:55 PM	Permanent Link
Raul Team Elevate	Barry, On 9/30/2012 9:18 PM, Barry wrote: > I have a problem with sub-selects not summing the proper number of rows. > There are 16 rows in the Expenses table, yet the Select statement counts only 5 Expense records. Why? > If I change it to "Group By PL_Id" then it finds all Expense rows. So it appears the subselect will only return the correct number of rows if the Subselect where clause uses the same column as the Group By statement. > Is this a bug? Or am I missing something? I'm no expert but AFAIK mixing the group by and subselect should honour the group by what you're seeing - Tim would need advise on how subselect should work in this case exactly. Since you're using group by on the date you're effectively reducing the company table to 3 returned rows so the subselect really only uses the 3 pl_id's and ignores others. I ran "select * from company group by start_date" i get back pl_Id's 3,7, and 8 and those do happen to have 2,2 and 1 expense rows. Raul
Sun, Sep 30 2012 11:12 PM	Permanent Link
Fernando Dias Team Elevate	Barry, I can count only 12 and not 16 rows in expenses, but thats not the problem. Anyway, your query is quite ambiguous ... For example, what company.pl_id value do you expect EDB to use in the sub select when date='2011-01-01' ? There are 3 different company.pl_id values for that date... the same idea applies for the other dates. Let's suppose EDB is taking the last occurrence of company.pl_id when grouping: 3 for '2011-01-01' --> the count in "expenses" would be 2 7 for '2011-02-01' --> the count in "expenses" would be 2 8 for '2011-03-01' --> the count in "expenses" would be 1 All seems to be correct, this is exactly what EDB is returning to you. So, no it's not a bug, it's how EDB deals with this kind of ambiguous queries with GROUP BY. -- Fernando Dias [Team Elevate]
Mon, Oct 1 2012 12:21 AM	Permanent Link
Barry	Raul, >Since you're using group by on the date you're effectively reducing the >company table to 3 returned rows so the subselect really only uses the 3 >pl_id's and ignores others. >I ran "select * from company group by start_date" i get back pl_Id's >3,7, and 8 and those do happen to have 2,2 and 1 expense rows. Correct. It looks like the SubSelect is summarizing the Expenses rows for the 3 distinct company.pl_id that appear in the final result set, and NOT for all the rows in the Company table. I would have thought the Subselect should have gone through all the rows in the Company table. So I replicated the tables in MySQL and sure enough, MySQL gives me the same results. I guess this is the way the subselects are suppose to works. Who would have thunk it? To solve the problem I created a view on Expenses that joins Company.Start_Date and the subselect now uses something like "Select ..., (Select sum(Expense_Amount) from Expenses where Expenses.Start_Date = Company.Start_Date)" and that produces the correct results. So the moral of this sad SubSelect story is, the Where clause of the Subselect has to match the Group By column of the outer Select statement if it is to summarize the inner rows properly. I learned something new today. Barry
Mon, Oct 1 2012 12:29 AM	Permanent Link
Barry	Fernando Dias wrote: > Anyway, your query is quite ambiguous ... > For example, what company.pl_id value do you expect EDB to use in the sub select when date='2011-01-01' ? I had erroneously thought the Subselect would execute first and join all Company.PL_Id=Expenses.PL_Id then do the Group By on the result. But that is not the case. EDB is calculating the results correctly (at least it agrees with MySQL). To solve the problem I created a view on the Expenses table so it has the Start_Date and now it can match this date with the Group By Start_Date and give me the correct totals. Barry