Login ProductsSalesSupportDownloadsAbout |
Home » Technical Support » ElevateDB Technical Support » Support Forums » ElevateDB General » View Thread |
Messages 11 to 20 of 20 total |
Very slow grouping |
Tue, Mar 20 2007 5:13 PM | Permanent Link |
Michael Baytalsky | Tim & Ole,
> << Unbelievable. How can it process col1 at such speed? >> > You would be amazed at how fast index leaf pages can be scanned when they > are a) highly-compressed due to a lot of repeating values and b) use a > pretty decent page size (8k or more). A database engine can literally read > millions of index entries in seconds. Well, reading index pages is not a problem. However, in that query col1 is not indexed and even if an index exists, there's still no way to determine min value of Col1 for each Col10. > You guys have me curious now - so perhaps I'll see what EDB could do with an > index optimization on the GROUP BY. That would be interesting. IMO, unless you can answer the question by simply reading stats, there should be little improvement in speed because, as you said reading is extremely fast and reading things in native order should be 2-3 times faster, then doing so in index order. Therefore, if a query requires columns not found in current index, using index may actually degrade performance... This is my guess, of course. IOW, don't optimize it too much . I just run an interesting test. I created FB database with file size limitation of 800MB. It can span into multiple files once the limit is reached, so the database which would normally be 1 file 1.6Gb is now 2 files by 800MB. Both databases only contain one table with the same data (similar to Ole's test - 5M records). The results are quite amazing: 1. Single file 1.6 GB database - 172 sec. 2. 2 by 800MB database - 73 sec. 2.5 times the difference. Same query, same data, same computer, same database. IOW, don't put too much data in one file Michael |
Tue, Mar 20 2007 5:39 PM | Permanent Link |
"Ole Willy Tuv" | Michael,
<< Do you by any chance have the script that creates big_table? >> A Transact-SQL script to create big_table is enclosed. Ole Willy Tuv Attachments: big_table.sql |
Tue, Mar 20 2007 5:43 PM | Permanent Link |
Tim Young [Elevate Software] Elevate Software, Inc. timyoung@elevatesoft.com | Michael,
<< Well, reading index pages is not a problem. However, in that query col1 is not indexed and even if an index exists, there's still no way to determine min value of Col1 for each Col10. >> Good point. I completely missed that we were now talking about a different query and was still thinking about the original query, which could benefit from a simple index scan. << That would be interesting. IMO, unless you can answer the question by simply reading stats, there should be little improvement in speed because, as you said reading is extremely fast and reading things in native order should be 2-3 times faster, then doing so in index order. Therefore, if a query requires columns not found in current index, using index may actually degrade performance... This is my guess, of course. IOW, don't optimize it too much . >> This is all very true. But, I doubt that SQL Server is keeping *exact* stats anywhere like DBISAM does (and EDB does not), so the best they could come up with is an estimated projection based upon some simpler index stats or a separate table of row value stats kept from an UPDATE STATISTICS type of statement. However, AFAIK, such stats are just approximations used for optimizations of row selections. << 2.5 times the difference. Same query, same data, same computer, same database. IOW, don't put too much data in one file >> Very interesting. Does it help to increase the DB page size for even the single-file database ? -- Tim Young Elevate Software www.elevatesoft.com |
Tue, Mar 20 2007 7:44 PM | Permanent Link |
Michael Baytalsky | > A Transact-SQL script to create big_table is enclosed. Thanks. Michael |
Tue, Mar 20 2007 7:57 PM | Permanent Link |
Michael Baytalsky | Tim,
> << 2.5 times the difference. Same query, same data, same computer, same > database. IOW, don't put too much data in one file >> > > Very interesting. Does it help to increase the DB page size for even the > single-file database ? Nope. I did simple test application and tried reading huge file by blocks of 4, 8, 16, 32, 64 KB. There's no significant difference in speed. (IOW, it's not the database, it's the file system itself). The speed degrades on huge files no matter what block size you use. I suspect that it may have something to do with how NTFS works. I will test it further on a different computer with more RAM tomorrow to see if it is because Windows cannot allocate enough memory to hold file allocation pointers or whatever. It seems like accessing file pages located too far from the beginning of the file is much slower for some reason - reading goes with almost same speed as writing, which is ridiculous. We are talking about purely sequential reading, not random access, btw. This thing should be researched, I think - it may help to avoid problems like the one Ralf was having with large blob file. Michael |
Tue, Mar 20 2007 8:31 PM | Permanent Link |
"Ole Willy Tuv" | EDB version (procedure) enclosed.
Ole Willy Tuv Attachments: sp_big_table.sql |
Tue, Mar 20 2007 8:36 PM | Permanent Link |
Michael Baytalsky | Tim,
>> Very interesting. Does it help to increase the DB page size for even >> the single-file database ? > The speed degrades on huge files no matter what block size you use. No, I'm taking my words back here - it seems like windows cache fooled me The raw speed is roughly at ~30MB/sec, however with huge files read caching doesn't work effectively, so subsequent reads don't increase speed of reading. This is why smaller database files may work faster (even though engine doesn't cache the whole thing it is still nicely cached by OS). So, typically the performance on 200MB file is more then 5 times faster then on 1Gb file. Michael |
Wed, Mar 21 2007 6:22 AM | Permanent Link |
Michael Baytalsky | Ole,
> << Do you by any chance have the script that creates big_table? >> > A Transact-SQL script to create big_table is enclosed. Hey, I just looked at big_table definition and see that col1 is identity and is pk. No wonder it knows min and max of it for each col10 There's no miracle - it just gets information from index stats. Try this query: select col10, count(col10), min(col3) from big_table group by col10 Michael |
Wed, Mar 21 2007 6:51 AM | Permanent Link |
"Ole Willy Tuv" | Michael,
<< Hey, I just looked at big_table definition and see that col1 is identity and is pk. No wonder it knows min and max of it for each col10 There's no miracle - it just gets information from index stats. >> Yes, but the main point is that SQL Server does use available indexes to group the table, and does the grouping very efficiently as opposed to e.g. Interbase. << select col10, count(col10), min(col3) from big_table group by col10 >> Correct, this query is not optimized - 100 seconds. If I needed to access the col3 column in a frequently used production query of this type, I'd create an index on col3. Ole Willy Tuv |
Wed, Mar 21 2007 8:09 AM | Permanent Link |
Michael Baytalsky | Ole,
> << Hey, I just looked at big_table definition and see that col1 is identity > and is pk. No wonder it knows min and max of it for each col10 There's > no miracle - it just gets information from index stats. >> Well, it's good but in reality there are not so many tasks where such optimization will work. The one example you gave is the pretty much the only one I can think of. Not a bad example though and that particular optimization would be nice to have... > Yes, but the main point is that SQL Server does use available indexes to > group the table, and does the grouping very efficiently as opposed to e.g. > Interbase. That's only true for PK/identity field and the field being grouped. Other then that, if you have more complex logic FB can even outperform MS SQL or at least not gonna be much slower. > << select col10, count(col10), min(col3) from big_table group by col10 >> > Correct, this query is not optimized - 100 seconds. You see what I mean? This query executes in ~90 secs on FB here without any indexes whatsoever. > If I needed to access > the col3 column in a frequently used production query of this type, I'd > create an index on col3. Unfortunately, I doubt that having index of col3 will help you any in this case - you can try it. I mean, create index on col3 and see if it helps the above select: select col10, count(col10), min(col3) from big_table group by col10 AFAIK, The only reason you were able to get that fast before is because you were using identity PK which is probably stored in col10 index along with col10 values. Michael |
« Previous Page | Page 2 of 2 | |
Jump to Page: 1 2 |
This web page was last updated on Saturday, May 4, 2024 at 12:54 AM | Privacy PolicySite Map © 2024 Elevate Software, Inc. All Rights Reserved Questions or comments ? E-mail us at info@elevatesoft.com |