Support Forums - View Thread

View Thread

The following is the text of the current message along with any replies.

Messages 1 to 10 of 146 total

Mon, Sep 25 2006 5:47 PM	Permanent Link
Tim Young [Elevate Software] Elevate Software, Inc. timyoung@elevatesoft.com	I'm very close to getting the pre-beta completed, and I'd like to get some feedback on the use of statistics in the indexes in ElevateDB. First, a little background...... Currently, ElevateDB uses statistics in indexes like DBISAM does, and these facilities allow for this functionality in ElevateDB: 1) Accurate row sequence numbers, regardless of the active index and even with ranges set (but not filters). This also means an accurate grid scrollbar when there are a) no filters or ranges or b) just ranges set on the table cursor. 2) Exact I/O cost calculations for optimized index scans in queries and filters. This means that ElevateDB can very accurately determine if it is more efficient to use a row scan or an index scan for a particular filtering operation, be it for JOINs or a WHERE clause. Now, those two items are significant. However, the statistics in the indexes also present some problems, especially with larger tables. They cause write I/O to occur on the indexes in proportion to the depth of the index in terms of levels multiplied by the index page size, even if only one page was actually updated. For example, say an index tree is 3 levels deep. That would mean that any update would incur 4096 bytes per page (default page size) multiplied by the 3 levels to equal 12288 bytes. If the index statistics were not used, then the write I/O would only be 4096 bytes. This extra I/O is only really an issue when not using transactions and performing individual row-by-row updates over several hundred rows or more. Transactions keep the pages in memory, so the effects of the extra I/O are somewhat negated. In addition, the index statistics cause the indexes to be larger than they would otherwise be, and this causes a bit more read and write I/O in general. And, of course, these problems simply aggregate as the tables (and the indexes) get larger. They particularly become an issue with multi-millions of rows (10 mil+). What I'm looking for in terms of feedback is: Would you consider the removal of the index statistics, and the subsequent removal of the ability for ElevateDB to provide option 1) above significantly negative ? Note: ElevateDB will still be able to provide option 2) above, although slightly less accurately since it will have to perform estimates instead of being able to give exact figures for I/O cost calculations. Also, this is a definite either-or choice since the code cannot be modified in a manner that allows for both due to structural changes required in the actual indexes themselves. ElevateDB (and DBISAM) use a modified B+Tree structure to allow for the index statistics, and a pure B+Tree, which the non-statistics version would use, is slightly different. Finally, the modifications required to remove the statisitics are not more than a few days work, so any delays would be minimal if I decide to go ahead with removing them. -- Tim Young Elevate Software www.elevatesoft.com
Mon, Sep 25 2006 6:24 PM	Permanent Link
"R. Tipton"	Stats in the indexes do slow things down a bit. It would be nice if you could come up with something like SFD (SuperFastDistinct). I realy dont know how SFD itself would handle STATISTICS but I will try and find out. I guess some guys will want to keep it as is. If SFD is as good as they claim on indexes then a 3rd party addon will not break anyones bank Now my thoughts are dump it speed up EDB and buy SFD............ Rita. "Tim Young [Elevate Software]" <timyoung@elevatesoft.com> wrote in message news:E0068B77-0361-4132-B84C-F4CAB316FE8A@news.elevatesoft.com... > I'm very close to getting the pre-beta completed, and I'd like to get some > feedback on the use of statistics in the indexes in ElevateDB. First, a > little background...... > > Currently, ElevateDB uses statistics in indexes like DBISAM does, and > these facilities allow for this functionality in ElevateDB: > > 1) Accurate row sequence numbers, regardless of the active index and even > with ranges set (but not filters). This also means an accurate grid > scrollbar when there are a) no filters or ranges or b) just ranges set on > the table cursor. > > 2) Exact I/O cost calculations for optimized index scans in queries and > filters. This means that ElevateDB can very accurately determine if it is > more efficient to use a row scan or an index scan for a particular > filtering operation, be it for JOINs or a WHERE clause. > > Now, those two items are significant. However, the statistics in the > indexes also present some problems, especially with larger tables. They > cause write I/O to occur on the indexes in proportion to the depth of the > index in terms of levels multiplied by the index page size, even if only > one page was actually updated. For example, say an index tree is 3 levels > deep. That would mean that any update would incur 4096 bytes per page > (default page size) multiplied by the 3 levels to equal 12288 bytes. If > the index statistics were not used, then the write I/O would only be 4096 > bytes. This extra I/O is only really an issue when not using transactions > and performing individual row-by-row updates over several hundred rows or > more. Transactions keep the pages in memory, so the effects of the extra > I/O are somewhat negated. In addition, the index statistics cause the > indexes to be larger than they would otherwise be, and this causes a bit > more read and write I/O in general. And, of course, these problems simply > aggregate as the tables (and the indexes) get larger. They particularly > become an issue with multi-millions of rows (10 mil+). > > What I'm looking for in terms of feedback is: > > Would you consider the removal of the index statistics, and the subsequent > removal of the ability for ElevateDB to provide option 1) above > significantly negative ? > > Note: ElevateDB will still be able to provide option 2) above, although > slightly less accurately since it will have to perform estimates instead > of being able to give exact figures for I/O cost calculations. Also, this > is a definite either-or choice since the code cannot be modified in a > manner that allows for both due to structural changes required in the > actual indexes themselves. ElevateDB (and DBISAM) use a modified B+Tree > structure to allow for the index statistics, and a pure B+Tree, which the > non-statistics version would use, is slightly different. Finally, the > modifications required to remove the statisitics are not more than a few > days work, so any delays would be minimal if I decide to go ahead with > removing them. > > -- > Tim Young > Elevate Software > www.elevatesoft.com >
Mon, Sep 25 2006 7:41 PM	Permanent Link
Charalabos Michael	Hello Tim, > Would you consider the removal of the index statistics, and the subsequent > removal of the ability for ElevateDB to provide option 1) above > significantly negative ? Why not instead of removing code, comment it and then test ElevateDB without Statistics to see if we really want them or not ? -- Charalabos Michael - [Creation Power] - http://www.creationpower.com - http://www.creationpower.gr
Mon, Sep 25 2006 9:39 PM	Permanent Link
John AJ Marknette Team Elevate	Tim Young [Elevate Software] wrote: > > What I'm looking for in terms of feedback is: > > Would you consider the removal of the index statistics, and the > subsequent removal of the ability for ElevateDB to provide option 1) > above significantly negative ? > > Note: ElevateDB will still be able to provide option 2) above, > although slightly less accurately since it will have to perform > estimates instead of being able to give exact figures for I/O cost > calculations. Also, this is a definite either-or choice since the > code cannot be modified in a manner that allows for both due to > structural changes required in the actual indexes themselves. > ElevateDB (and DBISAM) use a modified B+Tree structure to allow for > the index statistics, and a pure B+Tree, which the non-statistics > version would use, is slightly different. Finally, the modifications > required to remove the statisitics are not more than a few days work, > so any delays would be minimal if I decide to go ahead with removing > them. ElevateDB Desktop Edition = Modified B+Tree ElevateDB Enterprise Edition = Pure B+Tree See how simply that was. I'm sure you could come up with a clever compiler define. Plus this gives the added benefit to be able to charge more for the enterprise edition. All jokes aside. If you can manage it, this way that would be the best of both worlds. - AJ Marknette --
Mon, Sep 25 2006 9:57 PM	Permanent Link
Tim Young [Elevate Software] Elevate Software, Inc. timyoung@elevatesoft.com	Rita, << Stats in the indexes do slow things down a bit. It would be nice if you could come up with something like SFD (SuperFastDistinct). I realy dont know how SFD itself would handle STATISTICS but I will try and find out. I guess some guys will want to keep it as is. If SFD is as good as they claim on indexes then a 3rd party addon will not break anyones bank Now my thoughts are dump it speed up EDB and buy SFD............ >> The effect of the index statistics on DISTINCT operations is minimal, unfortunately. The only thing that DBISAM and ElevateDB don't do with regard to distinct processing in terms of nice "shortcuts" is use existing indexes, which is what I believe SFD does. -- Tim Young Elevate Software www.elevatesoft.com
Mon, Sep 25 2006 10:01 PM	Permanent Link
Tim Young [Elevate Software] Elevate Software, Inc. timyoung@elevatesoft.com	Michael, << Why not instead of removing code, comment it and then test ElevateDB without Statistics to see if we really want them or not ? >> Well, since I just told you what the only negative effect would be (no more sequence numbers), I initially thought that would be enough to elicit a response one way or the other. Also, the point is not how to handle changing the code (we have backups of everything anyways), but rather whether it is something that I can pursue without pissing off too many customers. -- Tim Young Elevate Software www.elevatesoft.com
Mon, Sep 25 2006 10:04 PM	Permanent Link
Tim Young [Elevate Software] Elevate Software, Inc. timyoung@elevatesoft.com	AJ, << ElevateDB Desktop Edition = Modified B+Tree ElevateDB Enterprise Edition = Pure B+Tree See how simply that was. >> I wish. Unfortunately it's a no-go since making two different and incompatible table formats would be a killer for acceptance out of the gate. << I'm sure you could come up with a clever compiler define. Plus this gives the added benefit to be able to charge more for the enterprise edition. >> We already have future plans for that (enterprise options in terms of transactions, etc.) with ElevateDB, so I don't want to use that up on a simple format change. << All jokes aside. If you can manage it, this way that would be the best of both worlds. >> I'm afraid that it's not an option in this case. -- Tim Young Elevate Software www.elevatesoft.com
Mon, Sep 25 2006 11:20 PM	Permanent Link
"Jerry Hayes"	With a new version and a lot of new code, I'd say to just keep with the existing scheme; you're intimately familiar with how it works and there's always ability to add an option for a different indexing scheme later. Keep the project on track with the compatible behavior; there's always enough extra surprises
Tue, Sep 26 2006 3:11 AM	Permanent Link
Roy Lambert NLH Associates Team Elevate	Tim Reading your post I can't see anything particularly negative if the statistics are removed, and since I don't have multi-million row tables, I also can't see anything particularly positive. Just to make sure I've got it right: NO STATS - smaller table sizes, faster writes, "inaccurate row sequence numbers", possibly slightly slower queries due to inability to estimate accurately STATS - larger table sizes, slower writes, "accurate row sequence numbers", possibly slightly faster queries due to ability to estimate accurately I don't use the row sequence numbers (I also don't understand what they have to do with statistics) so I don't care Query speed, reading your post, shouldn't be impacted much (row scan or an index scan) and guessing it'll only be on "weird" cases If I'm right I'd go for no stats. Roy Lambert
Tue, Sep 26 2006 4:20 AM	Permanent Link
"Stefano Monterisi"	Hi Tim..... ......and a property at database level (or better table level) like INDEXSTAT = TRUE/FALSE than can be set by programmers on project's needs? You have to manage 2 different logic with a great job, but we could be happier.... (I don't known what can be your effort, honestly) Stefano Monterisi

	Page 1 of 15	Next Page »
Jump to Page: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15