Icon View Thread

The following is the text of the current message along with any replies.
Messages 11 to 20 of 21 total
Thread Is ElevateDb faster than DBISAM 4?
Wed, Oct 10 2007 9:06 AMPermanent Link

Tim Young [Elevate Software]

Elevate Software, Inc.

Avatar

Email timyoung@elevatesoft.com

Dave,

<< I've uploaded the benchmark program to the binaries. EDB did better when
I decreased the transaction size to 100 rows instead of using 1500 or 15000.
It was about twice as fast as DBISAM when tuned correctly. >>

Is it just me, or is the ElevateDB index building portion of the code
missing ?

--
Tim Young
Elevate Software
www.elevatesoft.com

Wed, Oct 10 2007 9:30 AMPermanent Link

Roy Lambert

NLH Associates

Team Elevate Team Elevate

Tim


That would explain the consistency of performance Smiley

Roy Lambert
Wed, Oct 10 2007 5:15 PMPermanent Link

Dave Harrison
Tim Young [Elevate Software] wrote:

> Dave,
>
> << I've uploaded the benchmark program to the binaries. EDB did better when
> I decreased the transaction size to 100 rows instead of using 1500 or 15000.
> It was about twice as fast as DBISAM when tuned correctly. >>
>
> Is it just me, or is the ElevateDB index building portion of the code
> missing ?
>
Oops. No, it was me. Smile

I've fixed it and re-uploaded it. This now compares with my real-world
tests where adding rows to an EDB table is a bit slower than DBISAM. I
had hoped it would be considerably faster. Maybe there is some magical
tweaking that can be done by the wizard himself. Unfortuantely I my
magic wand is broken and I've got another one on order from
http://www.alivans.com/choose-your-wand.htm and it won't be here for a
few weeks. Smiley

Dave
Thu, Oct 11 2007 1:02 PMPermanent Link

Tim Young [Elevate Software]

Elevate Software, Inc.

Avatar

Email timyoung@elevatesoft.com

Dave,

<< I've fixed it and re-uploaded it. This now compares with my real-world
tests where adding rows to an EDB table is a bit slower than DBISAM. >>

Well, for starters the EDB index format isn't going to be able to compress
near-unique values as well, and your tests are generating completely random
data that is near-unique in most cases.   That will be a disadvantage to
EDB's index compression and not DBISAM's.  Apart from that, I'll have to do
some more testing to find out what the whole situation is.  I've run EDB a
few times, but not DBISAM yet.

--
Tim Young
Elevate Software
www.elevatesoft.com

Thu, Oct 11 2007 3:24 PMPermanent Link

Dave Harrison
Tim Young [Elevate Software] wrote:
> Dave,
>
> << I've fixed it and re-uploaded it. This now compares with my real-world
> tests where adding rows to an EDB table is a bit slower than DBISAM. >>
>
> Well, for starters the EDB index format isn't going to be able to compress
> near-unique values as well, and your tests are generating completely random
> data that is near-unique in most cases.

The key is name up of the fields Name1,Code1,Date1.
Name1 may appear to be random, but each Name1 value will occur at least
4500 times, and up to 38,000 times when the table has 1 million rows.
Code1 has values only of "A","B","C". Date1 ranges over a 27 year
period. The combination of all 3 will make a nearly unique key, which
would be the case in a real world application.

> That will be a disadvantage to
> EDB's index compression and not DBISAM's.  Apart from that, I'll have to do
> some more testing to find out what the whole situation is.  I've run EDB a
> few times, but not DBISAM yet.
>

Ok. I thought I'd run it by you in case I was doing something wrong.
Wouldn't be the first time. SmileI was a little surprised to see DBISAM
was faster at loading the data.

Dave
Thu, Oct 11 2007 5:20 PMPermanent Link

Tim Young [Elevate Software]

Elevate Software, Inc.

Avatar

Email timyoung@elevatesoft.com

Dave,

<< The key is name up of the fields Name1,Code1,Date1. Name1 may appear to
be random, but each Name1 value will occur at least 4500 times, and up to
38,000 times when the table has 1 million rows. Code1 has values only of
"A","B","C". Date1 ranges over a 27 year period. The combination of all 3
will make a nearly unique key, which would be the case in a real world
application. >>

Yes, that's exactly what I meant.  However, discard what I said - I assumed
that the DBISAM test was using compression for the indexes, and it isn't, so
what I said doesn't matter nearly as much.

<< Ok. I thought I'd run it by you in case I was doing something wrong.
Wouldn't be the first time. SmileI was a little surprised to see DBISAM was
faster at loading the data. >>

Well, as far as I can see, it isn't on the average:

DBISAM    Elapsed: 1,130.1 sec, #Rcds:1,000,000     885 Rcds/Sec  Trans
Flush: No Trans Size=1000 Build Index Before

DBISAM    Elapsed: 978.5 sec, #Rcds:1,000,000   1,022 Rcds/Sec  Trans Flush:
No Trans Size=5000 Build Index Before

DBISAM    Elapsed: 1,190.0 sec, #Rcds:1,000,000     840 Rcds/Sec  Trans
Flush: No Trans Size=10000 Build Index Before

DBISAM    Elapsed: 440.1 sec, #Rcds:1,000,000   2,272 Rcds/Sec  Trans Flush:
No Trans Size=15000 Build Index Before

DBISAM    Elapsed: 376.5 sec, #Rcds:1,000,000   2,656 Rcds/Sec  Trans Flush:
No Trans Size=30000 Build Index Before


ElevateDb Elapsed: 652.8 sec, #Rcds:1,000,000   1,532 Rcds/Sec  Trans
Size=1000 Build Index Before

ElevateDb Elapsed: 637.7 sec, #Rcds:1,000,000   1,568 Rcds/Sec  Trans
Size=5000 Build Index Before

ElevateDb Elapsed: 473.6 sec, #Rcds:1,000,000   2,112 Rcds/Sec  Trans
Size=10000 Build Index Before

ElevateDb Elapsed: 446.8 sec, #Rcds:1,000,000   2,238 Rcds/Sec  Trans
Size=15000 Build Index Before

ElevateDb Elapsed: 464.9 sec, #Rcds:1,000,000   2,151 Rcds/Sec  Trans
Size=30000 Build Index Before

There are a couple of instances that are reproducible in which DBISAM
becomes completely disk-bound during the process.  The two that stick out
are the 1000 row and 10000 row tests that I did.  EDB, on the other hand,
does not have this problem since it doesn't touch nearly as much index data
during an average index update due to not having the statistics in the
indexes.  This is what I was referring to when I spoke about the index
updates being faster.

--
Tim Young
Elevate Software
www.elevatesoft.com

Fri, Oct 12 2007 12:57 AMPermanent Link

Dave Harrison
Tim Young [Elevate Software] wrote:

> Dave,
>
> << The key is name up of the fields Name1,Code1,Date1. Name1 may appear to
> be random, but each Name1 value will occur at least 4500 times, and up to
> 38,000 times when the table has 1 million rows. Code1 has values only of
> "A","B","C". Date1 ranges over a 27 year period. The combination of all 3
> will make a nearly unique key, which would be the case in a real world
> application. >>
>
> Yes, that's exactly what I meant.  However, discard what I said - I assumed
> that the DBISAM test was using compression for the indexes, and it isn't, so
> what I said doesn't matter nearly as much.
>
> << Ok. I thought I'd run it by you in case I was doing something wrong.
> Wouldn't be the first time. SmileI was a little surprised to see DBISAM was
> faster at loading the data. >>
>
>
> There are a couple of instances that are reproducible in which DBISAM
> becomes completely disk-bound during the process.  The two that stick out
> are the 1000 row and 10000 row tests that I did.  EDB, on the other hand,
> does not have this problem since it doesn't touch nearly as much index data
> during an average index update due to not having the statistics in the
> indexes.  This is what I was referring to when I spoke about the index
> updates being faster.

Each engine has to be optimized for the best transaction size.

The best DBISAM speed I found was:
DBISAM    Elapsed: 137.0 sec, #Rcds:1,000,000   7,299 Rcds/Sec  Trans
Flush: No Trans Size=7500 Build Index Before

beats the best EDB speed:
ElevateDb Elapsed: 193.0 sec, #Rcds:1,000,000   5,182 Rcds/Sec  Trans
Size=6000 Build Index Before

by about 40%. So if EDB doesn't touch the index as much as DBISAM, then
EDB is spending a lot more time some place else.


BTW, I'm not sure why you're getting such lousy times:
Yours:

> DBISAM    Elapsed: 1,190.0 sec, #Rcds:1,000,000     840 Rcds/Sec
> Trans Flush: No Trans Size=10000 Build Index Before

Mine is over 8x faster for same transaction size:
DBISAM    Elapsed: 144.1 sec, #Rcds:1,000,000   6,941 Rcds/Sec  Trans
Flush: No Trans Size=10000 Build Index Before


Yours:
> ElevateDb Elapsed: 652.8 sec, #Rcds:1,000,000   1,532 Rcds/Sec  Trans
> Size=1000 Build Index Before

Mine is 3x faster:
ElevateDb Elapsed: 213.5 sec, #Rcds:1,000,000   4,685 Rcds/Sec  Trans
Size=1000 Build Index Before


Are you running yours on a laptop?


Dave
Fri, Oct 12 2007 9:46 AMPermanent Link

Tim Young [Elevate Software]

Elevate Software, Inc.

Avatar

Email timyoung@elevatesoft.com

Dave,

<< Each engine has to be optimized for the best transaction size. >>

Yes, I understand.   However, my point is that EDB has the best overall
performance with *any* transaction size, not just selective ones.

<< The best DBISAM speed I found was:
DBISAM    Elapsed: 137.0 sec, #Rcds:1,000,000   7,299 Rcds/Sec  Trans
Flush: No Trans Size=7500 Build Index Before

beats the best EDB speed:
ElevateDb Elapsed: 193.0 sec, #Rcds:1,000,000   5,182 Rcds/Sec  Trans
Size=6000 Build Index Before

by about 40%. So if EDB doesn't touch the index as much as DBISAM, then EDB
is spending a lot more time some place else. >>

I can't account for what happens on your particular machine without being
able to profile it here.  All I can go by is what I see here when I profile
the application on my machine (Intel Pentium D 2.8GHz desktop with 1GB of
RAM with SpeedStep disabled).  From my tests, DBISAM is becoming I/O bound
on certain tests while EDB provides fairly-consistent results across the
board.  As far as EDB spending a lot more time elsewhere, there really isn't
any where else that would account for a difference that is as pronounced as
that.  I think we'll need to get some other machines in this mix in order to
rule out any particular issue with your machine or my machine with regard to
this test.

<< BTW, I'm not sure why you're getting such lousy times: >>

Well, we're using different machines, for starters. Smiley I don't use
particularly fast machines because they tend to give database engine
developers a false sense of performance for the average machine.  I'm also
using D7 instead of D2007 for the compilation, but I don't think that there
is that much difference between the two compilers.  I'm doing some tests
right now to rule out any differences with FastMM vs. the native Delphi
memory manager.

--
Tim Young
Elevate Software
www.elevatesoft.com

Fri, Oct 12 2007 10:41 PMPermanent Link

Tim Young [Elevate Software]

Elevate Software, Inc.

Avatar

Email timyoung@elevatesoft.com

Dave,

Okay, after testing with FastMM the differences become more pronounced in
DBISAM's favor.  Further testing reveals that the issue is the Name1 column
in the two indexes.  EDB has to do quite a bit more moving around of data
for strings since it uses long strings (AnsiStrings and WideStrings) for
comparing, etc.  This goes back to the original design requirements for not
using direct pointer access for .NET managed code portability.  String
accesses require some allocation and move-ing of memory, whereas DBISAM
directly addresses the column in the row buffer.  If you remove the Name1
column from both indexes, you will see this phenomenon in action.  For
example, this is the result that I get without the Name1 column in the
indexes and using the FastMM memory manager:

ElevateDb Elapsed: 169.9 sec, #Rcds:1,000,000   5,886 Rcds/Sec  Trans
Size=15000 Build Index Before

DBISAM    Elapsed: 223.0 sec, #Rcds:1,000,000   4,484 Rcds/Sec  Trans Flush:
No Trans Size=15000 Build Index Before

DBISAM starts off fast, but degrades quickly, whereas EDB degrades fairly
gracefully as it does with the Name1 column included.

I'm going to have to investigate some caching techniques to work around this
issue, but for now the main thing to take away from this is to avoid using
string columns that are ~20 characters or more in the indexes if you're
interested in the maximum amount of performance.  In your test, the
randomness of the Name1 input also exacerbates the issue since it causes
many more column comparisons in the index keys than a less random input
would cause.

BTW, here are my original FastMM results with the Name1 column included:

DBISAM    Elapsed: 275.9 sec, #Rcds:1,000,000   3,625 Rcds/Sec  Trans Flush:
No Trans Size=15000 Build Index Before

ElevateDb Elapsed: 336.3 sec, #Rcds:1,000,000   2,973 Rcds/Sec  Trans
Size=15000 Build Index Before

Tim Young
Elevate Software
www.elevatesoft.com

Fri, Oct 12 2007 11:41 PMPermanent Link

Dave Harrison
Tim Young [Elevate Software] wrote:

> Dave,
>
> Okay, after testing with FastMM the differences become more pronounced in
> DBISAM's favor.  Further testing reveals that the issue is the Name1 column
> in the two indexes.  EDB has to do quite a bit more moving around of data
> for strings since it uses long strings (AnsiStrings and WideStrings) for
> comparing, etc.  This goes back to the original design requirements for not
> using direct pointer access for .NET managed code portability.  String
> accesses require some allocation and move-ing of memory, whereas DBISAM
> directly addresses the column in the row buffer.  If you remove the Name1
> column from both indexes, you will see this phenomenon in action.  For
> example, this is the result that I get without the Name1 column in the
> indexes and using the FastMM memory manager:
>
> ElevateDb Elapsed: 169.9 sec, #Rcds:1,000,000   5,886 Rcds/Sec  Trans
> Size=15000 Build Index Before
>
> DBISAM    Elapsed: 223.0 sec, #Rcds:1,000,000   4,484 Rcds/Sec  Trans Flush:
> No Trans Size=15000 Build Index Before
>
> DBISAM starts off fast, but degrades quickly, whereas EDB degrades fairly
> gracefully as it does with the Name1 column included.

That's what I discovered too. EDB speed doesn't diminish as quickly as
other databases.

>
> I'm going to have to investigate some caching techniques to work around this
> issue, but for now the main thing to take away from this is to avoid using
> string columns that are ~20 characters or more in the indexes if you're
> interested in the maximum amount of performance.  In your test, the
> randomness of the Name1 input also exacerbates the issue since it causes
> many more column comparisons in the index keys than a less random input
> would cause.
>
> BTW, here are my original FastMM results with the Name1 column included:
>
> DBISAM    Elapsed: 275.9 sec, #Rcds:1,000,000   3,625 Rcds/Sec  Trans Flush:
> No Trans Size=15000 Build Index Before
>
> ElevateDb Elapsed: 336.3 sec, #Rcds:1,000,000   2,973 Rcds/Sec  Trans
> Size=15000 Build Index Before

So without the String it is 50% to 60% faster. A lot of databases will
index faster without string fields in the index. The only solution I see
is to convert the string into a unique integer that is maintained in
another table. It could work but would be a bit difficult to sort it
because new string values could be added every month would mean
reordering the integers. I think this is still feasible though.

Thanks for your feedback.

Dave
« Previous PagePage 2 of 3Next Page »
Jump to Page:  1 2 3
Image