Icon View Thread

The following is the text of the current message along with any replies.
Messages 1 to 10 of 146 total
Thread Some Feedback Required
Mon, Sep 25 2006 5:47 PMPermanent Link

Tim Young [Elevate Software]

Elevate Software, Inc.

Avatar

Email timyoung@elevatesoft.com

I'm very close to getting the pre-beta completed, and I'd like to get some
feedback on the use of statistics in the indexes in ElevateDB.  First, a
little background......

Currently, ElevateDB uses statistics in indexes like DBISAM does, and these
facilities allow for this functionality in ElevateDB:

1) Accurate row sequence numbers, regardless of the active index and even
with ranges set (but not filters).  This also means an accurate grid
scrollbar when there are a) no filters or ranges or b) just ranges set on
the table cursor.

2) Exact I/O cost calculations for optimized index scans in queries and
filters.  This means that ElevateDB can very accurately determine if it is
more efficient to use a row scan or an index scan for a particular filtering
operation, be it for JOINs or a WHERE clause.

Now, those two items are significant.  However, the statistics in the
indexes also present some problems, especially with larger tables.  They
cause write I/O to occur on the indexes in proportion to the depth of the
index in terms of levels multiplied by the index page size, even if only one
page was actually updated.  For example, say an index tree is 3 levels deep.
That would mean that any update would incur 4096 bytes per page (default
page size) multiplied by the 3 levels to equal 12288 bytes.  If the index
statistics were not used, then the write I/O would only be 4096 bytes.  This
extra I/O is only really an issue when not using transactions and performing
individual row-by-row updates over several hundred rows or more.
Transactions keep the pages in memory, so the effects of the extra I/O are
somewhat negated.  In addition, the index statistics cause the indexes to be
larger than they would otherwise be, and this causes a bit more read and
write I/O in general.  And, of course, these problems simply aggregate as
the tables (and the indexes) get larger.  They particularly become an issue
with multi-millions of rows (10 mil+).

What I'm looking for in terms of feedback is:

Would you consider the removal of the index statistics, and the subsequent
removal of the ability for ElevateDB to provide option 1) above
significantly negative ?

Note: ElevateDB will still be able to provide option 2) above, although
slightly less accurately since it will have to perform estimates instead of
being able to give exact figures for I/O cost calculations.  Also, this is a
definite either-or choice since the code cannot be modified in a manner that
allows for both due to structural changes required in the actual indexes
themselves.  ElevateDB (and DBISAM) use a modified B+Tree structure to allow
for the index statistics, and a pure B+Tree, which the non-statistics
version would use, is slightly different.  Finally, the modifications
required to remove the statisitics are not more than a few days work, so any
delays would be minimal if I decide to go ahead with removing them.

--
Tim Young
Elevate Software
www.elevatesoft.com

Mon, Sep 25 2006 6:24 PMPermanent Link

"R. Tipton"
Stats in the indexes do slow things down a bit.
It would be nice if you could come up with
something like SFD (SuperFastDistinct).
I realy dont know how SFD itself would handle
STATISTICS but I will try and find out.
I guess some guys will want to keep it as is.
If SFD is as good as they claim on indexes then
a 3rd party addon will not break anyones bank
Now my thoughts are dump it speed up EDB
and buy SFD............ Wink

Rita.

"Tim Young [Elevate Software]" <timyoung@elevatesoft.com> wrote in message
news:E0068B77-0361-4132-B84C-F4CAB316FE8A@news.elevatesoft.com...
> I'm very close to getting the pre-beta completed, and I'd like to get some
> feedback on the use of statistics in the indexes in ElevateDB.  First, a
> little background......
>
> Currently, ElevateDB uses statistics in indexes like DBISAM does, and
> these facilities allow for this functionality in ElevateDB:
>
> 1) Accurate row sequence numbers, regardless of the active index and even
> with ranges set (but not filters).  This also means an accurate grid
> scrollbar when there are a) no filters or ranges or b) just ranges set on
> the table cursor.
>
> 2) Exact I/O cost calculations for optimized index scans in queries and
> filters.  This means that ElevateDB can very accurately determine if it is
> more efficient to use a row scan or an index scan for a particular
> filtering operation, be it for JOINs or a WHERE clause.
>
> Now, those two items are significant.  However, the statistics in the
> indexes also present some problems, especially with larger tables.  They
> cause write I/O to occur on the indexes in proportion to the depth of the
> index in terms of levels multiplied by the index page size, even if only
> one page was actually updated.  For example, say an index tree is 3 levels
> deep. That would mean that any update would incur 4096 bytes per page
> (default page size) multiplied by the 3 levels to equal 12288 bytes.  If
> the index statistics were not used, then the write I/O would only be 4096
> bytes.  This extra I/O is only really an issue when not using transactions
> and performing individual row-by-row updates over several hundred rows or
> more. Transactions keep the pages in memory, so the effects of the extra
> I/O are somewhat negated.  In addition, the index statistics cause the
> indexes to be larger than they would otherwise be, and this causes a bit
> more read and write I/O in general.  And, of course, these problems simply
> aggregate as the tables (and the indexes) get larger.  They particularly
> become an issue with multi-millions of rows (10 mil+).
>
> What I'm looking for in terms of feedback is:
>
> Would you consider the removal of the index statistics, and the subsequent
> removal of the ability for ElevateDB to provide option 1) above
> significantly negative ?
>
> Note: ElevateDB will still be able to provide option 2) above, although
> slightly less accurately since it will have to perform estimates instead
> of being able to give exact figures for I/O cost calculations.  Also, this
> is a definite either-or choice since the code cannot be modified in a
> manner that allows for both due to structural changes required in the
> actual indexes themselves.  ElevateDB (and DBISAM) use a modified B+Tree
> structure to allow for the index statistics, and a pure B+Tree, which the
> non-statistics version would use, is slightly different.  Finally, the
> modifications required to remove the statisitics are not more than a few
> days work, so any delays would be minimal if I decide to go ahead with
> removing them.
>
> --
> Tim Young
> Elevate Software
> www.elevatesoft.com
>

Mon, Sep 25 2006 7:41 PMPermanent Link

Charalabos Michael
Hello
Tim,

> Would you consider the removal of the index statistics, and the subsequent
> removal of the ability for ElevateDB to provide option 1) above
> significantly negative ?

Why not instead of removing code, comment it and then
test ElevateDB without Statistics to see if we really
want them or not ?

--
Charalabos Michael - [Creation Power] - http://www.creationpower.com -
http://www.creationpower.gr
Mon, Sep 25 2006 9:39 PMPermanent Link

John AJ Marknette

Avatar

Team Elevate Team Elevate

Tim Young [Elevate Software] wrote:

>
> What I'm looking for in terms of feedback is:
>
> Would you consider the removal of the index statistics, and the
> subsequent removal of the ability for ElevateDB to provide option 1)
> above significantly negative ?
>
> Note: ElevateDB will still be able to provide option 2) above,
> although slightly less accurately since it will have to perform
> estimates instead of being able to give exact figures for I/O cost
> calculations.  Also, this is a definite either-or choice since the
> code cannot be modified in a manner that allows for both due to
> structural changes required in the actual indexes themselves.
> ElevateDB (and DBISAM) use a modified B+Tree structure to allow for
> the index statistics, and a pure B+Tree, which the non-statistics
> version would use, is slightly different.  Finally, the modifications
> required to remove the statisitics are not more than a few days work,
> so any delays would be minimal if I decide to go ahead with removing
> them.


ElevateDB Desktop Edition = Modified B+Tree

ElevateDB Enterprise Edition = Pure B+Tree


See how simply that was. Wink

I'm sure you could come up with a clever compiler define. Plus this
gives the added benefit to be able to charge more for the enterprise
edition.

All jokes aside. If you can manage it, this way that would be the best
of both worlds.


- AJ Marknette






--
Mon, Sep 25 2006 9:57 PMPermanent Link

Tim Young [Elevate Software]

Elevate Software, Inc.

Avatar

Email timyoung@elevatesoft.com

Rita,

<< Stats in the indexes do slow things down a bit. It would be nice if you
could come up with something like SFD (SuperFastDistinct). I realy dont know
how SFD itself would handle STATISTICS but I will try and find out. I guess
some guys will want to keep it as is. If SFD is as good as they claim on
indexes then a 3rd party addon will not break anyones bank Now my thoughts
are dump it speed up EDB and buy SFD............  Wink>>

The effect of the index statistics on DISTINCT operations is minimal,
unfortunately.   The only thing that DBISAM and ElevateDB don't do with
regard to distinct processing in terms of nice "shortcuts" is use existing
indexes, which is what I believe SFD does.

--
Tim Young
Elevate Software
www.elevatesoft.com

Mon, Sep 25 2006 10:01 PMPermanent Link

Tim Young [Elevate Software]

Elevate Software, Inc.

Avatar

Email timyoung@elevatesoft.com

Michael,

<< Why not instead of removing code, comment it and then test ElevateDB
without Statistics to see if we really want them or not ? >>

Well, since I just told you what the only negative effect would be (no more
sequence numbers), I initially thought that would be enough to elicit a
response one way or the other. Wink

Also, the point is not how to handle changing the code (we have backups of
everything anyways), but rather whether it is something that I can pursue
without pissing off too many customers. Smiley

--
Tim Young
Elevate Software
www.elevatesoft.com

Mon, Sep 25 2006 10:04 PMPermanent Link

Tim Young [Elevate Software]

Elevate Software, Inc.

Avatar

Email timyoung@elevatesoft.com

AJ,

<< ElevateDB Desktop Edition = Modified B+Tree

ElevateDB Enterprise Edition = Pure B+Tree

See how simply that was. Wink>>

I wish. Smiley Unfortunately it's a no-go since making two different and
incompatible table formats would be a killer for acceptance out of the gate.

<< I'm sure you could come up with a clever compiler define. Plus this gives
the added benefit to be able to charge more for the enterprise
edition. >>

We already have future plans for that (enterprise options in terms of
transactions, etc.) with ElevateDB, so I don't want to use that up on a
simple format change.

<< All jokes aside. If you can manage it, this way that would be the best of
both worlds. >>

I'm afraid that it's not an option in this case.

--
Tim Young
Elevate Software
www.elevatesoft.com

Mon, Sep 25 2006 11:20 PMPermanent Link

"Jerry Hayes"
With a new version and a lot of new code, I'd say to just keep with the
existing scheme; you're intimately familiar with how it works and there's
always ability to add an option for a different indexing scheme later.

Keep the project on track with the compatible behavior; there's always
enough extra surprises Wink

Tue, Sep 26 2006 3:11 AMPermanent Link

Roy Lambert

NLH Associates

Team Elevate Team Elevate

Tim


Reading your post I can't see anything particularly negative if the statistics are removed, and since I don't have multi-million row tables, I also can't see anything particularly positive.

Just to make sure I've got it right:

NO STATS - smaller table sizes, faster writes, "inaccurate row sequence numbers", possibly slightly slower queries due to inability to estimate accurately

STATS - larger table sizes, slower writes, "accurate row sequence numbers", possibly slightly faster queries due to ability to estimate accurately

I don't use the row sequence numbers (I also don't understand what they have to do with statistics) so I don't care
Query speed, reading your post, shouldn't be impacted much (row scan or an index scan) and guessing it'll only be on "weird" cases

If I'm right I'd go for no stats.


Roy Lambert
Tue, Sep 26 2006 4:20 AMPermanent Link

"Stefano Monterisi"
Hi Tim.....


......and a property at database level (or better table level) like INDEXSTAT
= TRUE/FALSE  than can be set by programmers on project's needs?
You have to manage 2 different logic with a great job, but we could be
happier....Smile
(I don't known what can be your effort, honestly)

Stefano Monterisi




Page 1 of 15Next Page »
Jump to Page:  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Image