Icon View Thread

The following is the text of the current message along with any replies.
Messages 11 to 20 of 32 total
Thread Creating indexes - looking for a silver bullet
Sun, Mar 18 2007 8:17 PMPermanent Link

"Ralf Mimoun"
Okeydokey...

The second run is, well running. Since about an hour. 55 GB read, 28 GB
written. For the first 5 minutes, everything ran fine, much CPU activity
(the new code to predigest everything for the text index function) and only
some seeks. After that, the CPU activity dropped to an 15% average, and the
drives are trying to grind their way through the case. WAG: not enough
cache. It that damn thing is not finished tomorrow morning, I'll buy two
more 1GB modules to get about 3.8 GB cache. That's not as funny as it
sounds. It will lose even the last hint of amusement if that does not help,
too.

Ralf
Mon, Mar 19 2007 3:43 AMPermanent Link

Roy Lambert

NLH Associates

Team Elevate Team Elevate

Dave


No - but read the posts - DBISAM overnight, ElevateDB 40 mins

Roy Lambert
Mon, Mar 19 2007 6:16 AMPermanent Link

"Jose Eduardo Helminsky"
Ralf

Try using compound indexes even if you dont need to use the second field in
the index. It will create different keys in the index. Ive remembered
CA-Clipper/DBF it was a nightmare.

Eduardo

Mon, Mar 19 2007 7:19 AMPermanent Link

"Ralf Mimoun"
Jose Eduardo Helminsky wrote:
> Ralf
>
> Try using compound indexes even if you dont need to use the second
> field in the index. It will create different keys in the index. Ive
> remembered CA-Clipper/DBF it was a nightmare.

That won't help. I _need_ TEXTSEARCH, and for that I need an text index.

Ralf
Mon, Mar 19 2007 8:26 AMPermanent Link

"Ralf Mimoun"
Status report:

The Beast has now 4 GB RAM (4x DDR2-667 1 GB), and Windows gets 3.4 GB
(standard effect, even if it hurt). Two stripesets, each with 2 160 GB
Samsung 7.200 drives.

Result #1: The additional RAM does not make much good, about 20 MB/seconds
read (I take the read speed as the performance indicator), less than 100%
faster than with 2 GB.

Result #2: 2.5 GB RAM disk. About 40 MB/second.

Result #3: SuperCache II, a piece of software that adds a r/w cache to
drives. 2.5 GB cache to D:, lazy deferred write (means: data are written at
shutdown). About 50 MB/second.

So, with a heap of hardware and some additional software, that thing runs
about 5 times faster. But will it be fast enough?

Ralf
Mon, Mar 19 2007 8:33 AMPermanent Link

Arnd Baranowski
Hi Ralf,

we are working with large amounts of data with DBISAM and there is only
one thing to overcome an issue like you described with such an amount of
indexes.
Split the table into several according to the main data areas used
within the table and give these tables a meaningfull (automated) names.
Adjust your application to search in these tables one after another.
While this might slow down a bit on small data amounts you will
experience an extreme performance boost on medium to large amounts of
data. More over you will be able to process an unlimited amount of data.

I hope you get your stuff finished in an acceptable manner!

Regards

Arnd

Ralf Mimoun wrote:
> Hi all.
>
> I stumbled over a problem with DBISAM 4. Some gigabyte of data must be
> imported into a DBISAM table. The result: 2 mllion records, 1 GB .DAT
> and 1 GB .BLB file. The BLB file contains values from the only blob
> field, a memo.
>
> Importing the data is no problem at all, and fast enough (means: shorter
> than a summernight). But to achieve that, I have to drop all indexes I
> don't need for import, and recreate them after the work is done. If I
> import with all indexes, it will need forever.
>
> Well, I started the index creation this morning, 1:30 am. Now it's 7:47
> am - and it's not finished yet. It looks as if it will need some more
> hours, and the hd is seeking like mad. Oh, the test system, especially
> built for that test: Core2Duo 1.8, 1 GB RAM (should be 2, but one module
> was not ok), 2 Samsung 160 GB SATA2 drives as a stripe set (for speed).
> Yes, more RAM might help because of more cache, and there are 15.000 rpm
> SCSI drives around.
>
> Here is the table structure:
>
> /* SQL-92 Table Creation Script with DBISAM Extensions */
>
> CREATE TABLE IF NOT EXISTS "ArtikelBib"
> (
>   "ID" AUTOINC,
>   "ISBN" VARCHAR(15),
>   "EAN" VARCHAR(15),
>   "ONIXID" VARCHAR(15),
>   "LibriNr" VARCHAR(15),
>   "BestellNr" VARCHAR(15),
>   "Bezeichnung" VARCHAR(80),
>   "DetailBezeichnung" VARCHAR(80),
>   "Autor" VARCHAR(30),
>   "Verlag" VARCHAR(30),
>   "VerlagVerkehrsNummer" VARCHAR(15),
>   "Verkaufspreis" DECIMAL(0,4),
>   "IstFestpreis" BOOLEAN,
>   "MwStID" INTEGER,
>   "Einband" VARCHAR(25),
>   "Auflage" VARCHAR(20),
>   "Beschreibung" MEMO,
>   "Warengruppe" VARCHAR(5),
>   "Meldenummer" INTEGER,
>   "MeldenummerDatum" DATE,
>   "Geaendert" BOOLEAN,
> PRIMARY KEY ("ID") COMPRESS NONE
> TEXT INDEX ("Bezeichnung","DetailBezeichnung","Autor","Beschreibung")
> STOP WORDS 'A AN AND BE FOR HOW IN IS IT OF ON OR THAT THE THIS TO WAS
> WHAT WHEN WHICH WHY WILL EIN EINE EINER UND IST FÜR WIE IN IST ER SIE ES
> WIR IHR VON AUF ODER UND DER DIE DAS WAR ALS WIRD'
> SPACE CHARS #1+#2+#3+#4+#5+#6+#7+#8+#9+#10+#11+#12+#13
> +#14+#15+#16+#17+#18+#19+#20+#21+#22+#23
> +#24+#25+#26+#27+#28+#29+#30+#31+#32+'*+'+
> ',-./:;<=>\`'
> INCLUDE CHARS '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ_ab'+
> 'cdefghijklmnopqrstuvwxyz€‚ƒ„… ‡ˆ‰ ‹ŒŽ'+
> '‘’“”•–—˜™š›œžŸ ¡¢£¤¥¦§¨©ª«¬­®¯°±²³´µ¶·'+
> '¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞß'+
> ' áâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþ'
> LOCALE CODE 1031
> USER MAJOR VERSION 1
> USER MINOR VERSION 22
> );
>
> CREATE NOCASE INDEX IF NOT EXISTS "AutorIndex" ON "ArtikelBib" ("Autor")
> COMPRESS FULL;
> CREATE NOCASE INDEX IF NOT EXISTS "ISBNIndex" ON "ArtikelBib" ("ISBN")
> COMPRESS FULL;
> CREATE NOCASE INDEX IF NOT EXISTS "EANIndex" ON "ArtikelBib" ("EAN")
> COMPRESS FULL;
> CREATE INDEX IF NOT EXISTS "LibriNrIndex" ON "ArtikelBib" ("LibriNr")
> COMPRESS FULL;
> CREATE INDEX IF NOT EXISTS "BestellNrIndex" ON "ArtikelBib"
> ("BestellNr") COMPRESS FULL;
> CREATE INDEX IF NOT EXISTS "ONIXIDIndex" ON "ArtikelBib" ("ONIXID")
> COMPRESS FULL;
> CREATE NOCASE INDEX IF NOT EXISTS "BezeichnungIndex" ON "ArtikelBib"
> ("Bezeichnung") COMPRESS FULL;
> CREATE NOCASE INDEX IF NOT EXISTS "DetailBezeichnungIndex" ON
> "ArtikelBib" ("DetailBezeichnung") COMPRESS FULL;
> CREATE INDEX IF NOT EXISTS "GeaendertIndex" ON "ArtikelBib"
> ("Geaendert") COMPRESS FULL;
>
> Some more information: the import function is multistaged. First, it
> drops all indexes (including the textsearch index, ALTER TABLE
> "ArtikelBib" TEXT INDEX ()) and imports the records leaving the memo
> field empty. That needs about 45 minutes. After that, the indexes are
> restored (2.5-3 hours) if there are no more steps. Otherwise, it only
> creates the EAN index (not measured, should be relatively fast).
>
> Then it drops all indexes except the EAN index for the next step, and
> imports the memo data. That leads to the 1 GB BLB file, about half the
> records get a value in that field, with an average size of 1000 bytes.
> English and german text . After that, all indexes are recreated. And
> that needs forever. Worse, the hd is that busy that I can't start the
> third step, which only copies files from a to b without any db activity.
>
> It seems that building the textsearch index is the culprit. What can I
> do to kick it's ass and speed it up? Would it make a huge difference to
> upgrade the machine from 1 GB to 4 GB? Faster drives are out of question
> because that will double the price of the machine.
>
> Ralf
Mon, Mar 19 2007 9:45 AMPermanent Link

"Ralf Mimoun"
Arnd Baranowski wrote:

> Split the table into several according to the main data areas used
> within the table and give these tables a meaningfull (automated)
> names.

I know these techniques since decades. But I unfortunately I can't use it
here without rewriting a huge amount of code, and there are problems I don't
have a clue how to solve them.

Ralf
Mon, Mar 19 2007 10:18 AMPermanent Link

Roy Lambert

NLH Associates

Team Elevate Team Elevate

Ralf


When the dust has all settled don't forget to let us all know what you managed to do and how.

Roy Lambert
Mon, Mar 19 2007 2:10 PMPermanent Link

"Ralf Mimoun"
Roy Lambert wrote:
> Ralf
>
>
> When the dust has all settled don't forget to let us all know what
> you managed to do and how.

Wow. Done. I can't believe it.

As I wrote, I stopped the first try with 2 GB RAM after about 24 hours. It
seems that this was premature. With the new setup, the whole thing needed
almost exactly 3 hours, and it was about 7 to 8 times as fast as the first
try. Means: about 10 MB/sec read with the original configuration, and about
75 MB/sec with the current config.

What's the difference... faster drives did not help that much. As long as
you don't cut the access time, the drive was stepping like crazy. But Raptor
or even SCSI disks are expensive.

2 more GB RAM (means: 3.25 GB for Windows, the rest is lost behind the PCI
memory area) helped a little bit. Twice as fast.

Now the real trick: I searched for information to optimize the cache
strategy of Windows 2k. What I found was a product that smelled like snake
oil, you know, like these DoubleRAM pseudo things that has been around about
5 years ago. It's called SuperCache II (http://www.superspeed.com).

But surprise! it works. You can set up block caches for each drive
separately. So I have set up a 2.5 GB cache for the drive with the table
files, and activated "deferred write" - writes are really written after some
time. I also activated "Suspend lazywriting" - nothing is written unless the
cache is full or the system is shut down properly. Or you stop the caching,
of course. You can change almost any parameter (cache size, sort algorithm
etc) on the fly. Shiny.

That setting is of course not safe. Nothing is written on disk, everything
is strictly temporary. Don't ask me how the file structures are looking if
the system crashes. But who cares? That drive is for these imports only, and
if anything faily, the import has to be rerun anyway.

The result is astonishing, the system is flying. I can only recommend that
tool if you have huge imports that must be done asap.

Ralf
Mon, Mar 19 2007 2:41 PMPermanent Link

Dave Harrison
Roy Lambert wrote:
> Dave
>
>
> No - but read the posts - DBISAM overnight, ElevateDB 40 mins
>
> Roy Lambert
>

Yeah but "overnight" in the far north is something like 6 months.Smile

So as a guestimate then its 40 mins versus 8hrs (480 min) then its
roughly 10x faster at building the index. This is a great improvement as
long as there is no loss in the speed of the queries or ranges. Of
course DBISAM index building was always slow, so I'll have to do some
benchmarking to see how fast it is relative to the competition.

Dave
Previous PagePage 2 of 4Next Page
Jump to Page:  1 2 3 4
Image