Login ProductsSalesSupportDownloadsAbout |
Home Technical Support DBISAM Technical Support Support Forums DBISAM General View Thread |
Messages 11 to 20 of 32 total |
Creating indexes - looking for a silver bullet |
Sun, Mar 18 2007 8:17 PM | Permanent Link |
"Ralf Mimoun" | Okeydokey...
The second run is, well running. Since about an hour. 55 GB read, 28 GB written. For the first 5 minutes, everything ran fine, much CPU activity (the new code to predigest everything for the text index function) and only some seeks. After that, the CPU activity dropped to an 15% average, and the drives are trying to grind their way through the case. WAG: not enough cache. It that damn thing is not finished tomorrow morning, I'll buy two more 1GB modules to get about 3.8 GB cache. That's not as funny as it sounds. It will lose even the last hint of amusement if that does not help, too. Ralf |
Mon, Mar 19 2007 3:43 AM | Permanent Link |
Roy Lambert NLH Associates Team Elevate | Dave
No - but read the posts - DBISAM overnight, ElevateDB 40 mins Roy Lambert |
Mon, Mar 19 2007 6:16 AM | Permanent Link |
"Jose Eduardo Helminsky" | Ralf
Try using compound indexes even if you dont need to use the second field in the index. It will create different keys in the index. Ive remembered CA-Clipper/DBF it was a nightmare. Eduardo |
Mon, Mar 19 2007 7:19 AM | Permanent Link |
"Ralf Mimoun" | Jose Eduardo Helminsky wrote:
> Ralf > > Try using compound indexes even if you dont need to use the second > field in the index. It will create different keys in the index. Ive > remembered CA-Clipper/DBF it was a nightmare. That won't help. I _need_ TEXTSEARCH, and for that I need an text index. Ralf |
Mon, Mar 19 2007 8:26 AM | Permanent Link |
"Ralf Mimoun" | Status report:
The Beast has now 4 GB RAM (4x DDR2-667 1 GB), and Windows gets 3.4 GB (standard effect, even if it hurt). Two stripesets, each with 2 160 GB Samsung 7.200 drives. Result #1: The additional RAM does not make much good, about 20 MB/seconds read (I take the read speed as the performance indicator), less than 100% faster than with 2 GB. Result #2: 2.5 GB RAM disk. About 40 MB/second. Result #3: SuperCache II, a piece of software that adds a r/w cache to drives. 2.5 GB cache to D:, lazy deferred write (means: data are written at shutdown). About 50 MB/second. So, with a heap of hardware and some additional software, that thing runs about 5 times faster. But will it be fast enough? Ralf |
Mon, Mar 19 2007 8:33 AM | Permanent Link |
Arnd Baranowski | Hi Ralf,
we are working with large amounts of data with DBISAM and there is only one thing to overcome an issue like you described with such an amount of indexes. Split the table into several according to the main data areas used within the table and give these tables a meaningfull (automated) names. Adjust your application to search in these tables one after another. While this might slow down a bit on small data amounts you will experience an extreme performance boost on medium to large amounts of data. More over you will be able to process an unlimited amount of data. I hope you get your stuff finished in an acceptable manner! Regards Arnd Ralf Mimoun wrote: > Hi all. > > I stumbled over a problem with DBISAM 4. Some gigabyte of data must be > imported into a DBISAM table. The result: 2 mllion records, 1 GB .DAT > and 1 GB .BLB file. The BLB file contains values from the only blob > field, a memo. > > Importing the data is no problem at all, and fast enough (means: shorter > than a summernight). But to achieve that, I have to drop all indexes I > don't need for import, and recreate them after the work is done. If I > import with all indexes, it will need forever. > > Well, I started the index creation this morning, 1:30 am. Now it's 7:47 > am - and it's not finished yet. It looks as if it will need some more > hours, and the hd is seeking like mad. Oh, the test system, especially > built for that test: Core2Duo 1.8, 1 GB RAM (should be 2, but one module > was not ok), 2 Samsung 160 GB SATA2 drives as a stripe set (for speed). > Yes, more RAM might help because of more cache, and there are 15.000 rpm > SCSI drives around. > > Here is the table structure: > > /* SQL-92 Table Creation Script with DBISAM Extensions */ > > CREATE TABLE IF NOT EXISTS "ArtikelBib" > ( > "ID" AUTOINC, > "ISBN" VARCHAR(15), > "EAN" VARCHAR(15), > "ONIXID" VARCHAR(15), > "LibriNr" VARCHAR(15), > "BestellNr" VARCHAR(15), > "Bezeichnung" VARCHAR(80), > "DetailBezeichnung" VARCHAR(80), > "Autor" VARCHAR(30), > "Verlag" VARCHAR(30), > "VerlagVerkehrsNummer" VARCHAR(15), > "Verkaufspreis" DECIMAL(0,4), > "IstFestpreis" BOOLEAN, > "MwStID" INTEGER, > "Einband" VARCHAR(25), > "Auflage" VARCHAR(20), > "Beschreibung" MEMO, > "Warengruppe" VARCHAR(5), > "Meldenummer" INTEGER, > "MeldenummerDatum" DATE, > "Geaendert" BOOLEAN, > PRIMARY KEY ("ID") COMPRESS NONE > TEXT INDEX ("Bezeichnung","DetailBezeichnung","Autor","Beschreibung") > STOP WORDS 'A AN AND BE FOR HOW IN IS IT OF ON OR THAT THE THIS TO WAS > WHAT WHEN WHICH WHY WILL EIN EINE EINER UND IST FÜR WIE IN IST ER SIE ES > WIR IHR VON AUF ODER UND DER DIE DAS WAR ALS WIRD' > SPACE CHARS #1+#2+#3+#4+#5+#6+#7+#8+#9+#10+#11+#12+#13 > +#14+#15+#16+#17+#18+#19+#20+#21+#22+#23 > +#24+#25+#26+#27+#28+#29+#30+#31+#32+'*+'+ > ',-./:;<=>\`' > INCLUDE CHARS '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ_ab'+ > 'cdefghijklmnopqrstuvwxyz€‚ƒ„… ‡ˆ‰ ‹ŒŽ'+ > '‘’“”•–—˜™š›œžŸ ¡¢£¤¥¦§¨©ª«¬®¯°±²³´µ¶·'+ > '¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞß'+ > ' áâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþ' > LOCALE CODE 1031 > USER MAJOR VERSION 1 > USER MINOR VERSION 22 > ); > > CREATE NOCASE INDEX IF NOT EXISTS "AutorIndex" ON "ArtikelBib" ("Autor") > COMPRESS FULL; > CREATE NOCASE INDEX IF NOT EXISTS "ISBNIndex" ON "ArtikelBib" ("ISBN") > COMPRESS FULL; > CREATE NOCASE INDEX IF NOT EXISTS "EANIndex" ON "ArtikelBib" ("EAN") > COMPRESS FULL; > CREATE INDEX IF NOT EXISTS "LibriNrIndex" ON "ArtikelBib" ("LibriNr") > COMPRESS FULL; > CREATE INDEX IF NOT EXISTS "BestellNrIndex" ON "ArtikelBib" > ("BestellNr") COMPRESS FULL; > CREATE INDEX IF NOT EXISTS "ONIXIDIndex" ON "ArtikelBib" ("ONIXID") > COMPRESS FULL; > CREATE NOCASE INDEX IF NOT EXISTS "BezeichnungIndex" ON "ArtikelBib" > ("Bezeichnung") COMPRESS FULL; > CREATE NOCASE INDEX IF NOT EXISTS "DetailBezeichnungIndex" ON > "ArtikelBib" ("DetailBezeichnung") COMPRESS FULL; > CREATE INDEX IF NOT EXISTS "GeaendertIndex" ON "ArtikelBib" > ("Geaendert") COMPRESS FULL; > > Some more information: the import function is multistaged. First, it > drops all indexes (including the textsearch index, ALTER TABLE > "ArtikelBib" TEXT INDEX ()) and imports the records leaving the memo > field empty. That needs about 45 minutes. After that, the indexes are > restored (2.5-3 hours) if there are no more steps. Otherwise, it only > creates the EAN index (not measured, should be relatively fast). > > Then it drops all indexes except the EAN index for the next step, and > imports the memo data. That leads to the 1 GB BLB file, about half the > records get a value in that field, with an average size of 1000 bytes. > English and german text . After that, all indexes are recreated. And > that needs forever. Worse, the hd is that busy that I can't start the > third step, which only copies files from a to b without any db activity. > > It seems that building the textsearch index is the culprit. What can I > do to kick it's ass and speed it up? Would it make a huge difference to > upgrade the machine from 1 GB to 4 GB? Faster drives are out of question > because that will double the price of the machine. > > Ralf |
Mon, Mar 19 2007 9:45 AM | Permanent Link |
"Ralf Mimoun" | Arnd Baranowski wrote:
> Split the table into several according to the main data areas used > within the table and give these tables a meaningfull (automated) > names. I know these techniques since decades. But I unfortunately I can't use it here without rewriting a huge amount of code, and there are problems I don't have a clue how to solve them. Ralf |
Mon, Mar 19 2007 10:18 AM | Permanent Link |
Roy Lambert NLH Associates Team Elevate | Ralf
When the dust has all settled don't forget to let us all know what you managed to do and how. Roy Lambert |
Mon, Mar 19 2007 2:10 PM | Permanent Link |
"Ralf Mimoun" | Roy Lambert wrote:
> Ralf > > > When the dust has all settled don't forget to let us all know what > you managed to do and how. Wow. Done. I can't believe it. As I wrote, I stopped the first try with 2 GB RAM after about 24 hours. It seems that this was premature. With the new setup, the whole thing needed almost exactly 3 hours, and it was about 7 to 8 times as fast as the first try. Means: about 10 MB/sec read with the original configuration, and about 75 MB/sec with the current config. What's the difference... faster drives did not help that much. As long as you don't cut the access time, the drive was stepping like crazy. But Raptor or even SCSI disks are expensive. 2 more GB RAM (means: 3.25 GB for Windows, the rest is lost behind the PCI memory area) helped a little bit. Twice as fast. Now the real trick: I searched for information to optimize the cache strategy of Windows 2k. What I found was a product that smelled like snake oil, you know, like these DoubleRAM pseudo things that has been around about 5 years ago. It's called SuperCache II (http://www.superspeed.com). But surprise! it works. You can set up block caches for each drive separately. So I have set up a 2.5 GB cache for the drive with the table files, and activated "deferred write" - writes are really written after some time. I also activated "Suspend lazywriting" - nothing is written unless the cache is full or the system is shut down properly. Or you stop the caching, of course. You can change almost any parameter (cache size, sort algorithm etc) on the fly. Shiny. That setting is of course not safe. Nothing is written on disk, everything is strictly temporary. Don't ask me how the file structures are looking if the system crashes. But who cares? That drive is for these imports only, and if anything faily, the import has to be rerun anyway. The result is astonishing, the system is flying. I can only recommend that tool if you have huge imports that must be done asap. Ralf |
Mon, Mar 19 2007 2:41 PM | Permanent Link |
Dave Harrison | Roy Lambert wrote:
> Dave > > > No - but read the posts - DBISAM overnight, ElevateDB 40 mins > > Roy Lambert > Yeah but "overnight" in the far north is something like 6 months. So as a guestimate then its 40 mins versus 8hrs (480 min) then its roughly 10x faster at building the index. This is a great improvement as long as there is no loss in the speed of the queries or ranges. Of course DBISAM index building was always slow, so I'll have to do some benchmarking to see how fast it is relative to the competition. Dave |
Previous Page | Page 2 of 4 | Next Page |
Jump to Page: 1 2 3 4 |
This web page was last updated on Saturday, April 27, 2024 at 08:52 PM | Privacy PolicySite Map © 2024 Elevate Software, Inc. All Rights Reserved Questions or comments ? E-mail us at info@elevatesoft.com |