Icon View Thread

The following is the text of the current message along with any replies.
Messages 11 to 20 of 20 total
Thread Very slow grouping
Tue, Mar 20 2007 5:13 PMPermanent Link

Michael Baytalsky
Tim & Ole,

> << Unbelievable. How can it process col1 at such speed? >>
> You would be amazed at how fast index leaf pages can be scanned when they
> are a) highly-compressed due to a lot of repeating values and b) use a
> pretty decent page size (8k or more).   A database engine can literally read
> millions of index entries in seconds.
Well, reading index pages is not a problem. However, in that query
col1 is not indexed and even if an index exists, there's still no way
to determine min value of Col1 for each Col10.

> You guys have me curious now - so perhaps I'll see what EDB could do with an
> index optimization on the GROUP BY.
That would be interesting. IMO, unless you can answer the question by
simply reading stats, there should be little improvement in speed because,
as you said reading is extremely fast and reading things in native order
should be 2-3 times faster, then doing so in index order. Therefore,
if a query requires columns not found in current index, using
index may actually degrade performance... This is my guess, of course.
IOW, don't optimize it too much Wink.


I just run an interesting test. I created FB database with file size
limitation of 800MB. It can span into multiple files once the limit
is reached, so the database which would normally be 1 file 1.6Gb is now
2 files by 800MB. Both databases only contain one table with the same
data (similar to Ole's test - 5M records). The results are quite amazing:
1. Single file 1.6 GB database - 172 sec.
2. 2 by 800MB database - 73 sec.

2.5 times the difference. Same query, same data, same computer, same database.
IOW, don't put too much data in one file Wink


Michael
Tue, Mar 20 2007 5:39 PMPermanent Link

"Ole Willy Tuv"
Michael,

<< Do you by any chance have the script that creates big_table? >>

A Transact-SQL script to create big_table is enclosed.

Ole Willy Tuv





Attachments: big_table.sql
Tue, Mar 20 2007 5:43 PMPermanent Link

Tim Young [Elevate Software]

Elevate Software, Inc.

Avatar

Email timyoung@elevatesoft.com

Michael,

<< Well, reading index pages is not a problem. However, in that query col1
is not indexed and even if an index exists, there's still no way to
determine min value of Col1 for each Col10. >>

Good point.  I completely missed that we were now talking about a different
query and was still thinking about the original query, which could benefit
from a simple index scan.

<< That would be interesting. IMO, unless you can answer the question by
simply reading stats, there should be little improvement in speed because,
as you said reading is extremely fast and reading things in native order
should be 2-3 times faster, then doing so in index order. Therefore, if a
query requires columns not found in current index, using
index may actually degrade performance... This is my guess, of course. IOW,
don't optimize it too much Wink. >>

This is all very true. Smiley But, I doubt that SQL Server is keeping *exact*
stats anywhere like DBISAM does (and EDB does not), so the best they could
come up with is an estimated projection based upon some simpler index stats
or a separate table of row value stats kept from an UPDATE STATISTICS type
of statement.  However, AFAIK, such stats are just approximations used for
optimizations of row selections.

<< 2.5 times the difference. Same query, same data, same computer, same
database. IOW, don't put too much data in one file Wink >>

Very interesting.  Does it help to increase the DB page size for even the
single-file database ?

--
Tim Young
Elevate Software
www.elevatesoft.com

Tue, Mar 20 2007 7:44 PMPermanent Link

Michael Baytalsky

> A Transact-SQL script to create big_table is enclosed.
Thanks.

Michael
Tue, Mar 20 2007 7:57 PMPermanent Link

Michael Baytalsky
Tim,

> << 2.5 times the difference. Same query, same data, same computer, same
> database. IOW, don't put too much data in one file Wink >>
>
> Very interesting.  Does it help to increase the DB page size for even the
> single-file database ?
Nope. I did simple test application and tried reading huge file by blocks
of 4, 8, 16, 32, 64 KB. There's no significant difference in speed.
(IOW, it's not the database, it's the file system itself).
The speed degrades on huge files no matter what block size you use.
I suspect that it may have something to do with how NTFS works. I will
test it further on a different computer with more RAM tomorrow to see
if it is because Windows cannot allocate enough memory to hold file
allocation pointers or whatever.
It seems like accessing file pages located too far from the beginning
of the file is much slower for some reason - reading goes with almost
same speed as writing, which is ridiculous. We are talking about
purely sequential reading, not random access, btw. This thing should
be researched, I think - it may help to avoid problems like the one Ralf
was having with large blob file.


Michael



Tue, Mar 20 2007 8:31 PMPermanent Link

"Ole Willy Tuv"
EDB version (procedure) enclosed.

Ole Willy Tuv





Attachments: sp_big_table.sql
Tue, Mar 20 2007 8:36 PMPermanent Link

Michael Baytalsky
Tim,

>> Very interesting.  Does it help to increase the DB page size for even
>> the single-file database ?
> The speed degrades on huge files no matter what block size you use.
No, I'm taking my words back here - it seems like windows cache fooled me Wink
The raw speed is roughly at ~30MB/sec, however with huge files read caching
doesn't work effectively, so subsequent reads don't increase speed of
reading. This is why smaller database files may work faster (even though
engine doesn't cache the whole thing it is still nicely cached by OS).
So, typically the performance on 200MB file is more then 5 times faster
then on 1Gb file.


Michael
Wed, Mar 21 2007 6:22 AMPermanent Link

Michael Baytalsky
Ole,

> << Do you by any chance have the script that creates big_table? >>
> A Transact-SQL script to create big_table is enclosed.

Hey, I just looked at big_table definition and see that col1 is
identity and is pk. No wonder it knows min and max of it for each col10 Wink
There's no miracle - it just gets information from index stats.
Try this query:
select col10,  count(col10), min(col3) from big_table group by col10




Michael
Wed, Mar 21 2007 6:51 AMPermanent Link

"Ole Willy Tuv"
Michael,

<< Hey, I just looked at big_table definition and see that col1 is identity
and is pk. No wonder it knows min and max of it for each col10 Wink There's
no miracle - it just gets information from index stats. >>

Yes, but the main point is that SQL Server does use available indexes to
group the table, and does the grouping very efficiently as opposed to e.g.
Interbase.

<< select col10,  count(col10), min(col3) from big_table group by col10 >>

Correct, this query is not optimized - 100 seconds. If I needed to access
the col3 column in a frequently used production query of this type, I'd
create an index on col3.

Ole Willy Tuv

Wed, Mar 21 2007 8:09 AMPermanent Link

Michael Baytalsky
Ole,

> << Hey, I just looked at big_table definition and see that col1 is identity
> and is pk. No wonder it knows min and max of it for each col10 Wink There's
> no miracle - it just gets information from index stats. >>
Well, it's good but in reality there are not so many tasks where such
optimization will work. The one example you gave is the pretty much the
only one I can think of. Not a bad example though and that particular
optimization would be nice to have...

> Yes, but the main point is that SQL Server does use available indexes to
> group the table, and does the grouping very efficiently as opposed to e.g.
> Interbase.
That's only true for PK/identity field and the field being grouped. Other
then that, if you have more complex logic FB can even outperform MS SQL
or at least not gonna be much slower.

> << select col10,  count(col10), min(col3) from big_table group by col10 >>
> Correct, this query is not optimized - 100 seconds.
You see what I mean? This query executes in ~90 secs on FB here without any
indexes whatsoever.

> If I needed to access
> the col3 column in a frequently used production query of this type, I'd
> create an index on col3.
Unfortunately, I doubt that having index of col3 will help you any in this case
- you can try it. I mean, create index on col3 and see if it helps the above
select: select col10,  count(col10), min(col3) from big_table group by col10

AFAIK, The only reason you were able to get that fast before is because you were
using identity PK which is probably stored in col10 index along with col10 values.


Michael
« Previous PagePage 2 of 2
Jump to Page:  1 2
Image