Icon View Thread

The following is the text of the current message along with any replies.
Messages 1 to 10 of 16 total
Thread IMPORT TABLE performance
Tue, Apr 14 2009 7:56 AMPermanent Link

Dan
I'm using "IMPORT TABLE" to feed data into a table from a file. It's a simple 3 fields
table: INTEGER, FLOAT, FLOAT. I do the import before I add any indexes.

The speed I'm getting on my system is some 11000 records/s, which is three times slower
than I expected for such a simple import operation, with no indexes involved.

Can you look into the code for the IMPORT TABLE please and see if there are any
bottlenecks there? All the INSERTs are put all in a single transaction? Any improvements
will be greatly appreciated, as this is the single point where 80% of the execution time
is wasted for me.

Thanks in advance!
Tue, Apr 14 2009 8:06 AMPermanent Link

Tim Young [Elevate Software]

Elevate Software, Inc.

Avatar

Email timyoung@elevatesoft.com

Dan,

<< I'm using "IMPORT TABLE" to feed data into a table from a file. It's a
simple 3 fields table: INTEGER, FLOAT, FLOAT. I do the import before I add
any indexes.

The speed I'm getting on my system is some 11000 records/s, which is three
times slower than I expected for such a simple import operation, with no
indexes involved. >>

Actually EDB uses an internal row ID index for the primary key if you don't
define a primary key for the table.

How many total rows are you importing, and how often does the import occur ?

11,000 rows/sec is actually pretty good for an import into a shared table.
What is the basis for your expectation of 33,000 rows/sec on an import ?  If
you're comparing it to a TList population or something similar, then that is
not a valid comparison.

--
Tim Young
Elevate Software
www.elevatesoft.com

Tue, Apr 14 2009 1:46 PMPermanent Link

Dan
"Tim Young [Elevate Software]" wrote:

How many total rows are you importing, and how often does the import occur ?

The import is part of a function in a EDB remote SQL Server app. That's why I need to
optimize this process, as it is frequently used.
There are 30.000 to 100.000 rows, but the problem appears beacause this is not a one time
call, but a commonly used function.
The import is done locally in the SQL server.

11,000 rows/sec is actually pretty good for an import into a shared table.
What is the basis for your expectation of 33,000 rows/sec on an import ?

I estimated the speed based both on the speed of the process used to generate the data
being imported (a full text search engine) and with the performance of SQLite for the same op.

I wonder if the IMPORT  can be improved or you have any other idea to get the CSV file
data with max speed into the EDB.

Thanks.
Tue, Apr 14 2009 2:04 PMPermanent Link

Dan
I forgot to write that for the import the user has exclusive rights, no shared access
during that step.
Thu, Apr 16 2009 5:08 PMPermanent Link

Tim Young [Elevate Software]

Elevate Software, Inc.

Avatar

Email timyoung@elevatesoft.com

Dan,

<< I estimated the speed based both on the speed of the process used to
generate the data being imported (a full text search engine) and with the
performance of SQLite for the same op. >>

Comparions to SQLite won't be very helpful since it doesn't even do
multi-user.

BTW, I forgot to ask what version of EDB were you testing, and are you
wrapping the import in a transaction at all ?

--
Tim Young
Elevate Software
www.elevatesoft.com

Thu, Apr 16 2009 5:11 PMPermanent Link

Tim Young [Elevate Software]

Elevate Software, Inc.

Avatar

Email timyoung@elevatesoft.com

Dan,

<< I forgot to write that for the import the user has exclusive rights, no
shared access during that step. >>

Yes, but IMPORT TABLE always opens the table in a shared manner.

Per my previous message about the version being used, if you are trying this
with a build prior to 2.02 Build 11, try it with Build 11 and see if that
helps.  If you're already using Build 11, then that is about as fast as
things can get.

--
Tim Young
Elevate Software
www.elevatesoft.com

Wed, Apr 22 2009 4:22 AMPermanent Link

Dan
"Tim Young [Elevate Software]" wrote:

Dan,

<<Yes, but IMPORT TABLE always opens the table in a shared manner.

Per my previous message about the version being used, if you are trying this
with a build prior to 2.02 Build 11, try it with Build 11 and see if that
helps.  If you're already using Build 11, then that is about as fast as
things can get.>>

I did the upgrade, things improved a bit, but I still search for a solution to speed up
the import.

I start from the fact that for the import the user has exclusive rights, no  shared access
during that step. And the data write  in CSV format is very fast. Maybe can you make a
suggestion on how to achieve the fastest speed on that operation. It is clear to me a
bottleneck for the whole operation. The FTS is 10% of the total execution time, the write
in CSV format 5%, the import is 80% and the query is another 10%. So a mundane task as the
data import is taking 80% of the time. It is not OK for me.

Maybe to skip both CSV format and IMPORT,  then write the data directly to the file in
ElevateDB table format? It has exclusive access and it is a very simple structure INTEGER
, FLOAT, FLOAT. Any other suggestion please to bring that data in ELEVATEDB environment?

Thanks again.
Wed, Apr 22 2009 7:49 AMPermanent Link

Roy Lambert

NLH Associates

Team Elevate Team Elevate

Dan


Out of interest -why can't you do the query on SQLServer?

Roy Lambert
Wed, Apr 22 2009 11:25 AMPermanent Link

Tim Young [Elevate Software]

Elevate Software, Inc.

Avatar

Email timyoung@elevatesoft.com

Dan,

<< I start from the fact that for the import the user has exclusive rights,
no  shared access during that step. And the data write  in CSV format is
very fast. Maybe can you make a suggestion on how to achieve the fastest
speed on that operation. It is clear to me a bottleneck for the whole
operation. The FTS is 10% of the total execution time, the write
in CSV format 5%, the import is 80% and the query is another 10%. So a
mundane task as the data import is taking 80% of the time. It is not OK for
me. >>

I'm sorry, but that is about as fast as ElevateDB will get with an import
(B11).  What you're expecting is performance that is equivalent to an
in-memory collection or some other basic type of data structure, and that is
not how a database engine works.  It may appear mundane to you, but there is
a lot more going on with an import.

<< Maybe to skip both CSV format and IMPORT,  then write the data directly
to the file in ElevateDB table format?  >>

We don't recommend doing that.  Any change to the way that ElevateDB handles
the table files could cause your code to break, and you still have to
re-index the table after you're done.

--
Tim Young
Elevate Software
www.elevatesoft.com

Wed, Apr 22 2009 11:29 AMPermanent Link

Tim Young [Elevate Software]

Elevate Software, Inc.

Avatar

Email timyoung@elevatesoft.com

Dan,

One thing that you may try is bumping up the buffer sizes for the table,
specifically the MAX ROW BUFFER SIZE:

http://www.elevatesoft.com/manual?action=mantopic&id=edb2sql&category=10&topic=149

http://www.elevatesoft.com/edb_faqt_3.htm

Something like this:

ALTER TABLE MyTable
MAX ROW BUFFER SIZE 1048576

That will permit ElevateDB to use more memory for buffering rows during the
import.

--
Tim Young
Elevate Software
www.elevatesoft.com

Page 1 of 2Next Page »
Jump to Page:  1 2
Image