Icon View Thread

The following is the text of the current message along with any replies.
Messages 11 to 20 of 29 total
Thread Blob file re-use
Mon, Feb 1 2010 4:40 AMPermanent Link

(Matthew Jones)
> What is the largest file size that is being added ?  Is there a
> large disparity in the smallest file size to the largest file size,
> of the files that are being added ?  The max you should be seeing
> is the largest file size * the highest number of rows in the table,
> provided that all rows always have BLOBs allocated for them.

There is indeed a disparity. Some of the files can be quite small, and some are
completely massive. I've checked the logs, and here is a typical byte count for the
files that get stored:

349173
3787424
259631
2252546
107071
728396
316247
2034116
211357
2741618
59530
556169
305189
1669440
50067
934238
278029
794198
748567
5098724
295931
689918
45465
99854
45271
93560
360684
2099721
159419
371498
90775
199700
136667
318428

From what you are saying, it sounds like you are allocating the largest size across
the board. If that is the case, that probably explains it, as there can be times
when we store a few of these data sets.

I'll check that the deletes are happening, but I'll look to optimise the tables
daily I think. Or perhaps to have two tables, one for small and one large.

/Matthew Jones/
Mon, Feb 1 2010 11:09 AMPermanent Link

Tim Young [Elevate Software]

Elevate Software, Inc.

Avatar

Email timyoung@elevatesoft.com

Roy,

<< Couldn't that cause what Matthew is seeing? What happens to BLOB blocks
from rows that have been deleted when the row is overwritten but the new
blob is smaller than the previous one? >>

Yes, but only to a certain limit, and no where near 120GB, based upon what
he's stated.

--
Tim Young
Elevate Software
www.elevatesoft.com

Mon, Feb 1 2010 11:09 AMPermanent Link

Tim Young [Elevate Software]

Elevate Software, Inc.

Avatar

Email timyoung@elevatesoft.com

Matthew,

<< From what you are saying, it sounds like you are allocating the largest
size across the board. >>

No, we're simply allocating what is requested.  However, if there a lot of
constant insertions and deletions, then the size of each allocated BLOB can
be, at times, beyond what is necessary for the current row, but the next
re-use could be just right.  IOW, the allocations balance out to the max
that is necessary in a random fashion.

<< If that is the case, that probably explains it, as there can be times
when we store a few of these data sets. >>

120GB still seems excessive.  How many records are in the table ?  You said
that the typical live contents are 40 or so files around 10MB max in size.
That means that the max .blb file size should be 400MB, i.e. every single
record allocated to a size of 10MB.

--
Tim Young
Elevate Software
www.elevatesoft.com

Mon, Feb 1 2010 11:59 AMPermanent Link

"Simon Page"
Tim Young [Elevate Software] wrote:

> Nothing in DBISAM, specifically.  Re-use of space is done immediately
> for each row.  As of 4.28 and higher, the allocated BLOB blocks for a
> given row stay with that row, even through deletion.  So, when the
> row is re-used, the BLOB blocks are re-used also.  

Just to be clear, are you saying that the number of blob blocks
associated with a particular record can never decrease even when the
record is deleted and the record row re-allocated to a new record with
a smaller blob field?

When blob fields grow are extra blob blocks just linked to the existing
ones or does some re-allocation go on at that point?

I can appreciate the reason for the change, I am just trying to work
out how the changes impact cases when using tables that can have
significant variations in blob field size and also multiple blob
fields. We may need to optimise tables more than we currently do
because we compress table files for export/import and so want to keep
the file sizes down.

For interest - does ElevateDB use a similar system?

Simon
Tue, Feb 2 2010 6:39 AMPermanent Link

(Matthew Jones)
> For interest - does ElevateDB use a similar system?

I'd like to know that too. I must make the move sometime!

/Matthew Jones/
Wed, Feb 3 2010 10:30 AMPermanent Link

Tim Young [Elevate Software]

Elevate Software, Inc.

Avatar

Email timyoung@elevatesoft.com

Simon,

<< Just to be clear, are you saying that the number of blob blocks
associated with a particular record can never decrease even when the record
is deleted and the record row re-allocated to a new record with a smaller
blob field? >>

Correct.  It avoids serious fragmentation of the BLOB blocks and allows for
the most optimal reads/writes to the BLOBs.

<< For interest - does ElevateDB use a similar system? >>

Yes, the exact same system.

--
Tim Young
Elevate Software
www.elevatesoft.com

Wed, Feb 3 2010 11:08 AMPermanent Link

Roy Lambert

NLH Associates

Team Elevate Team Elevate

Tim

><< Just to be clear, are you saying that the number of blob blocks
>associated with a particular record can never decrease even when the record
>is deleted and the record row re-allocated to a new record with a smaller
>blob field? >>
>
>Correct. It avoids serious fragmentation of the BLOB blocks and allows for
>the most optimal reads/writes to the BLOBs.
>
><< For interest - does ElevateDB use a similar system? >>
>
>Yes, the exact same system.

Just to make it totally clear.I presume that since optimising is essentially writing the table out to new files that sorts the issue (not I did not say problem Smiley,

Matthew quoted "typical" sizes and they would not account for 40 records = 120Gb but the above reuse strategy could if there are occasional mega files. I'm not enough of a mathematician to work out how many would be needed over what period with what rate of insert/delete to produce the effect but I can picture it happening.

If someone does have the maths skills it would be interesting to know.

Roy Lambert


Wed, Feb 3 2010 4:22 PMPermanent Link

"Raul"

Even if we assume 3 times the records (delete is over 15 min after insert so
might be some extra records temporarily) and worst case scenario - every
record has been hit by the largest file we're still talking about 1GB per
record (120GB file and 120 records).

This is nowhere near the size Matthew quoted (looked to me maybe max size of
5MB for ones he posted and he himself said no more than 10MB in general. So
assuming max file size is 10MB we'd need some 12,000 records (if my math is
correct).

So either records (and associated blobs) are not reused and are continually
allocated new or something else is going on here (like lot high number of
records and/or some real jumbo files).

Raul


> Matthew quoted "typical" sizes and they would not account for 40 records =
> 120Gb but the above reuse strategy could if there are occasional mega
> files. I'm not enough of a mathematician to work out how many would be
> needed over what period with what rate of insert/delete to produce the
> effect but I can picture it happening.
>
> If someone does have the maths skills it would be interesting to know.
>



__________ Information from ESET NOD32 Antivirus, version of virus signature database 4832 (20100203) __________

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com


Thu, Feb 4 2010 2:56 AMPermanent Link

Roy Lambert

NLH Associates

Team Elevate Team Elevate

Raul


I think (but can't prove it) its more to do with maximum file size. You only need one file at 120Gb and there you go Smiley Its just a thought. Remember all you need are 40 x 3Gb files and you can reach the 120Gb

I have had glitches where files / tables / jpgs have been corrupt and the OS thinks they're massive. However, I must admit never 120Gb <vbg>

Roy Lambert
Thu, Feb 4 2010 4:19 AMPermanent Link

(Matthew Jones)
I will try to get more stats, but there can be worst-case situations where there is
an internet outage for more than 3 minutes, and then each remote site (and there
can be over a hundred) may cause a load of the files into the database. Thus over a
long period the file could get very big in this situation. This is very much an
edge case, but I'll add some query info to my optimise code so that I can find out
what the record counts are. If I ever get any interesting info from the field, I'll
report back.

/Matthew Jones/
« Previous PagePage 2 of 3Next Page »
Jump to Page:  1 2 3
Image