Support Forums - View Thread

View Thread

The following is the text of the current message along with any replies.

Messages 1 to 10 of 10 total

Tue, Oct 30 2007 11:10 PM	Permanent Link
Dave Harrison	DBISAM and ElevateDb are the only databases I've seen where it make no speed difference if the index is built before or after the data is loaded. They both complete at nearly the same times. Every other database I tested will load data faster if the table doesn't have any indexes and the index is built after the data has been loaded. Dave
Wed, Oct 31 2007 1:12 PM	Permanent Link
Tim Young [Elevate Software] Elevate Software, Inc. timyoung@elevatesoft.com	Dave, << DBISAM and ElevateDb are the only databases I've seen where it make no speed difference if the index is built before or after the data is loaded. They both complete at nearly the same times. Every other database I tested will load data faster if the table doesn't have any indexes and the index is built after the data has been loaded. >> It's due to the way that the indexes are built in both. There are two ways to build an index: 1) By looping through the rows and building the indexes row by row as if the rows were "naturally" inserted. This is what DBISAM and EDB do, and is why you don't see a difference. 2) Building the index in bulk by looping through the rows, extracting the keys, sorting them using a merge-sort, and then writing them out page-by-page and building the index from the leaf pages of the tree upwards. This is a bit faster and results in a very compact index initially, but later on the index will develop pockets of free space as new keys are inserted/deleted from the index and will eventually end up just like 1). -- Tim Young Elevate Software www.elevatesoft.com
Wed, Oct 31 2007 4:37 PM	Permanent Link
Dave Harrison	Tim Young [Elevate Software] wrote: > Dave, > > << DBISAM and ElevateDb are the only databases I've seen where it make no > speed difference if the index is built before or after the data is loaded. > They both complete at nearly the same times. Every other database I tested > will load data faster if the table doesn't have any indexes and the index is > built after the data has been loaded. >> > > It's due to the way that the indexes are built in both. There are two ways > to build an index: > > 1) By looping through the rows and building the indexes row by row as if the > rows were "naturally" inserted. This is what DBISAM and EDB do, and is why > you don't see a difference. > > 2) Building the index in bulk by looping through the rows, extracting the > keys, sorting them using a merge-sort, and then writing them out > page-by-page and building the index from the leaf pages of the tree upwards. > This is a bit faster and results in a very compact index initially, but > later on the index will develop pockets of free space as new keys are > inserted/deleted from the index and will eventually end up just like 1). I would have thought an SQL "Create Index On Table ..." would have been implemented by #2 then you'd at least get the index built twice as fast. Just because later on, the index may eventually degrade into #1, is no reason not to optimize index creation. I have a lot of tables that are readonly. Building a couple of indexes on a large table is an overnight process that I'd like to avoid. Dave
Thu, Nov 1 2007 4:53 PM	Permanent Link
Tim Young [Elevate Software] Elevate Software, Inc. timyoung@elevatesoft.com	Dave, << I would have thought an SQL "Create Index On Table ..." would have been implemented by #2 then you'd at least get the index built twice as fast. >> We've done tests here with DBISAM, and the difference in performance was negligible until you got into using a lot of memory. << Just because later on, the index may eventually degrade into #1, is no reason not to optimize index creation. >> That's not what I said. I was just pointing out that the compactness of the index initially results in some performance issues when the index is first used because of all of the additional page splits required for the initial inserts. << I have a lot of tables that are readonly. Building a couple of indexes on a large table is an overnight process that I'd like to avoid. >> Did you try bumping up the memory settings in EDB or DBISAM ? That should help a bit, although the randomness of the data being input will always have a bearing on how fast an index build executes, regardless of how it is built. It's something we'll revisit later on with the EDB enterprise server and the ability to allocate larger blocks of memory for a given session without potentially causing issues with other sessions. -- Tim Young Elevate Software www.elevatesoft.com
Thu, Nov 1 2007 6:31 PM	Permanent Link
Dave Harrison	Tim Young [Elevate Software] wrote: > Dave, > > << I would have thought an SQL "Create Index On Table ..." would have been > implemented by #2 then you'd at least get the index built twice as fast. >> > > We've done tests here with DBISAM, and the difference in performance was > negligible until you got into using a lot of memory. And do you remember how much the performance increased if you throw say a gb of RAM at the problem? I don't mind loading the machine up with RAM if it solves the problem. > > << Just because later on, the index may eventually degrade into #1, is no > reason not to optimize index creation. >> > > That's not what I said. I was just pointing out that the compactness of the > index initially results in some performance issues when the index is first > used because of all of the additional page splits required for the initial > inserts. Ok > > << I have a lot of tables that are readonly. Building a couple of indexes on > a large table is an overnight process that I'd like to avoid. >> > > Did you try bumping up the memory settings in EDB or DBISAM ? How do I do that with DBISAM? I tried increasing the buffers in EDB but didn't notice any difference. > That should > help a bit, although the randomness of the data being input will always have > a bearing on how fast an index build executes, regardless of how it is > built. In my benchmark demo the data was random. In my real word application the data comes in with the first 2 key fields as constant for every 5k rows and only the date changes for each row (successive days). So the index values are pretty much non-random. It still takes 8 hours to load the data. > It's something we'll revisit later on with the EDB enterprise server and the > ability to allocate larger blocks of memory for a given session without > potentially causing issues with other sessions. I look forward to testing it. Ent class databases usually throw gb of RAM at the db server to solve this problem and it works quite well. I just find it a shame that I have a spare gb (or two) and DBISAM can't use it. Dave
Sat, Nov 3 2007 6:21 PM	Permanent Link
Tim Young [Elevate Software] Elevate Software, Inc. timyoung@elevatesoft.com	Dave, << And do you remember how much the performance increased if you throw say a gb of RAM at the problem? I don't mind loading the machine up with RAM if it solves the problem. >> We never went that high, but it was performed with 2-100 megs of memory for the sorting, with the best performance above 75 megs or so. << How do I do that with DBISAM? >> See here: http://www.elevatesoft.com/dbisam4d7_customizing_engine.htm under "Memory Buffer Customizations". << I tried increasing the buffers in EDB but didn't notice any difference. >> As I said, it may not make any difference due to the randomness of the data in the rows. You have to start using memory buffers in the hundreds of megs if you're trying to optimize the indexing/sorting of very large tables. << In my benchmark demo the data was random. In my real word application the data comes in with the first 2 key fields as constant for every 5k rows and only the date changes for each row (successive days). So the index values are pretty much non-random. It still takes 8 hours to load the data. >> I'd have to do some actual playing around with your data to comment any further. If you'd like to adapt the benchmark app that you wrote to use the non-random data, I'd be happy to tell you what is going on. << I look forward to testing it. Ent class databases usually throw gb of RAM at the db server to solve this problem and it works quite well. I just find it a shame that I have a spare gb (or two) and DBISAM can't use it. >> It just wasn't designed that way. Large memory buffering in a database engine requires a design that cannot handle multi-user (non-C/S) usage, and vice-versa. -- Tim Young Elevate Software www.elevatesoft.com
Wed, Nov 7 2007 8:31 AM	Permanent Link
	Can I ask whether these buffers have any impact on multi-threaded operation? I ask because I have an app that may have up to 8 threads analysing data and storing the results into a single database with multiple tables. The end result is loads of data all being inserted "feathered" into the tables. I wonder if there is a change in the buffer sizes that would make this more efficient - there are a lot of indexes to update and I pondered dropping them before, storing the data, then indexing at the end. I didn't do this as I couldn't easily work out when the last thread had finished and do the indexing. But perhaps there are buffer or other configurations that might make the operations faster as they'd have to throw away less as presumably each thread has to keep re-reading the files to refresh its copy. I'm now at the stage where I can do some tweaking to test alternative options - are there any you'd recommend looking at? Thanks in advance, /Matthew Jones/
Wed, Nov 7 2007 1:40 PM	Permanent Link
Tim Young [Elevate Software] Elevate Software, Inc. timyoung@elevatesoft.com	Matthew, << I'm now at the stage where I can do some tweaking to test alternative options - are there any you'd recommend looking at? >> I would start with increasing the buffer sizes. Increasing them, especially for the indexes, should help the updating quite a bit. However, you will reach a point of diminishing returns - the benefits don't increase forever as you increase the buffer sizes. -- Tim Young Elevate Software www.elevatesoft.com
Thu, Nov 8 2007 5:29 AM	Permanent Link
	Thanks. I'll report back when I get to experiment. /Matthew Jones/
Thu, Nov 8 2007 3:57 PM	Permanent Link
Dave Harrison	matthew@matthewdelme-jones.delme.com (Matthew Jones) wrote: > Thanks. I'll report back when I get to experiment. > > /Matthew Jones/ Matthew, Great. Let us know what speed improvements you saw for your different buffer sizes. Dave