Login ProductsSalesSupportDownloadsAbout |
Home » Technical Support » ElevateDB Technical Support » Support Forums » ElevateDB General » View Thread |
Messages 1 to 10 of 20 total |
Very slow grouping |
Mon, Mar 19 2007 1:21 PM | Permanent Link |
"Ole Willy Tuv" | My big_table has 5 million rows. col10 is a TIMESTAMP column containing 5
groups of datetime values, 1 million rows each. There is an index on the col10 column. The following query: select col10, count(*) from big_table group by col10 takes 150 seconds to process the result with 5 groups/rows. I guess EDB doesn't use the index when grouping the table. Ole Willy Tuv |
Mon, Mar 19 2007 6:39 PM | Permanent Link |
Tim Young [Elevate Software] Elevate Software, Inc. timyoung@elevatesoft.com | Ole,
<< I guess EDB doesn't use the index when grouping the table. >> Not at this time, no. Optimized ordering and grouping by index is scheduled for round 2. -- Tim Young Elevate Software www.elevatesoft.com |
Mon, Mar 19 2007 7:14 PM | Permanent Link |
Michael Baytalsky | Ole,
I suppose without optimization this query should be solved at speed around 20-40MB/sec depending on the computer. What size of the big_table are we talking about? Michael Ole Willy Tuv wrote: > My big_table has 5 million rows. col10 is a TIMESTAMP column containing 5 > groups of datetime values, 1 million rows each. There is an index on the > col10 column. > > The following query: > > select col10, count(*) > from big_table > group by col10 > > takes 150 seconds to process the result with 5 groups/rows. > > I guess EDB doesn't use the index when grouping the table. > > Ole Willy Tuv > > |
Mon, Mar 19 2007 7:27 PM | Permanent Link |
"Ole Willy Tuv" | Michael,
<< I suppose without optimization this query should be solved at speed around 20-40MB/sec depending on the computer. What size of the big_table are we talking about? >> The table has has 10 columns and 5 million rows, total size 1450 MB. There are 5 different col10 values, such that each group contains 1 million rows. With index support, the grouping would take only a few seconds. Ole Willy Tuv |
Tue, Mar 20 2007 6:07 AM | Permanent Link |
Michael Baytalsky | Ole,
> The table has has 10 columns and 5 million rows, total size 1450 MB. There > are 5 different col10 values, such that each group contains 1 million rows. Well, then the speed should be around 50 secs with no optimization. 150 secs sounds pretty slow anyway even without any optimization or index. > With index support, the grouping would take only a few seconds. This really depends on the type of index and whether the query can actually be optimized in this case. I know that e.g. Firebird won't take advantage of an index in this particular case. In order to take advantage you will need to know that no more values are requested but count(*). If, e.g. you would try to filter out anything in where clause or use sum instead of count (or even count(another field)) - the index would become completely useless and even worth, because scanning table using index is slower, then doing so in natural order and the selectiveness is zero in this case. There would be a need for index if you write: select x, count(*) from t where x between a and b group by x In this case if we are lucky an index by x could become selective. <wild_guess> Try dropping index and see if that helps EDB to improve speed. It could be that it still tries to scan using index which is how we get form 50 secs to 150 secs. </wild_guess> Michael |
Tue, Mar 20 2007 11:07 AM | Permanent Link |
"Ole Willy Tuv" | Michael,
<<< With index support, the grouping would take only a few seconds. >>> << This really depends on the type of index and whether the query can actually be optimized in this case. I know that e.g. Firebird won't take advantage of an index in this particular case. In order to take advantage you will need to know that no more values are requested but count(*). If, e.g. you would try to filter out anything in where clause or use sum instead of count (or even count(another field)) - the index would become completely useless and even worth, because scanning table using index is slower, then doing so in natural order and the selectiveness is zero in this case. >> I guess it depends on the implementation. SQL Server certainly uses index optimization in this case. SQL Server executes the following query: select col10, count(*), count(col1), min(col1), max(col1) from big_table group by col10 in 100 seconds without an index on the col10 column and 3 seconds with an index. << Try dropping index and see if that helps EDB to improve speed. It could be that it still tries to scan using index which is how we get form 50 secs to 150 secs. >> Nope, same timing with/without the index. Hopefully, ElevateDB will execute the query a lot faster when Tim has implemented index optimization of grouping and ordering. Ole Willy Tuv |
Tue, Mar 20 2007 2:18 PM | Permanent Link |
Michael Baytalsky | Ole,
> SQL Server executes the following query: > > select > col10, > count(*), > count(col1), > min(col1), > max(col1) > from big_table > group by col10 > > in 100 seconds without an index on the col10 column and 3 seconds with an > index. 3 secs for the above query is quite impressive... I don't see how index can help in optimization of min(col1)... unless it stores extensive statistics and/or there are only few distinct values stored in col1. On the other hand 100 secs for no index seem a bit slow. How about: select col10, count(*), count(col1), min(col1 + 2 * col1), max(col1) from big_table group by col10 Do you by any chance have the script that creates big_table? Michael |
Tue, Mar 20 2007 2:58 PM | Permanent Link |
"Ole Willy Tuv" | Michael,
<< 3 secs for the above query is quite impressive... >> Very impressive, indeed. I just executed the query after a reboot to eliminate any caching and it only took 4 seconds, then 3 seconds on subsequent runs. << On the other hand 100 secs for no index seem a bit slow. >> Well, that might just be my system (Centrino laptop with 1 giga ram and a pretty full hd). Anyhow, it's not slow compared to other engines on the same system. Interbase 7.5 ran the simple "select col10, count(*)" query in 197 seconds without the index and 90 seconds with the index. << How about: select col10, count(*), count(col1), min(col1 + 2 * col1), max(col1) from big_table group by col10 >> 3.3 seconds << Do you by any chance have the script that creates big_table? >> I'll make a script and get back to you later. Ole Willy Tuv |
Tue, Mar 20 2007 4:29 PM | Permanent Link |
Michael Baytalsky | Ole,
> << 3 secs for the above query is quite impressive... >> > Very impressive, indeed. I just executed the query after a reboot to > eliminate any caching and it only took 4 seconds, then 3 seconds on > subsequent runs. I don't really understand how they do it that's a miracle I tried something similar on FB and it takes ~100-200 secs on such huge database. Actually, I was wrong when I estimated throughput at 30MB. On my system (also 1GB laptop) it degrades dramatically once data file reaches ~1G. FB processes 1.1 GB file for ~60sec reliably, however 1.7 GB file takes anywhere from 170 to 500 secs and the system may become disk bound. To research this further I wrote a simple Delphi program to write and read a huge file with plan test strings. The results are pretty strange: 1. 380 MB file: writing 20 secs, reading 1.7 secs = !!! 223 MB/sec 2. 1.4 GB file: writing 100 secs, reading 95 secs = about 15 MB/sec It seems like reading slows down way too much for huge files. I will try it tomorrow on 2GB system to see how it perform. If my findings are correct, then multi-file database engines rule. > << On the other hand 100 secs for no index seem a bit slow. >> > Well, that might just be my system (Centrino laptop with 1 giga ram and a > pretty full hd). Anyhow, it's not slow compared to other engines on the same > system. Interbase 7.5 ran the simple "select col10, count(*)" query in 197 > seconds without the index and 90 seconds with the index. Yes, this seem about right, I was extrapolating performance based on results from 400MB database - it seems like I was wrong. > << How about: > select > col10, > count(*), > count(col1), > min(col1 + 2 * col1), > max(col1) > from big_table > group by col10 >> > > 3.3 seconds Unbelievable. How can it process col1 at such speed? Are you sure you have different values in col1? > << Do you by any chance have the script that creates big_table? >> > I'll make a script and get back to you later. Thanks. Michael |
Tue, Mar 20 2007 4:56 PM | Permanent Link |
Tim Young [Elevate Software] Elevate Software, Inc. timyoung@elevatesoft.com | Michael,
<< Unbelievable. How can it process col1 at such speed? >> You would be amazed at how fast index leaf pages can be scanned when they are a) highly-compressed due to a lot of repeating values and b) use a pretty decent page size (8k or more). A database engine can literally read millions of index entries in seconds. You guys have me curious now - so perhaps I'll see what EDB could do with an index optimization on the GROUP BY. -- Tim Young Elevate Software www.elevatesoft.com |
Page 1 of 2 | Next Page » | |
Jump to Page: 1 2 |
This web page was last updated on Saturday, May 4, 2024 at 09:18 PM | Privacy PolicySite Map © 2024 Elevate Software, Inc. All Rights Reserved Questions or comments ? E-mail us at info@elevatesoft.com |