Icon View Thread

The following is the text of the current message along with any replies.
Messages 1 to 10 of 10 total
Thread Sensitive dataset with subselect selecting
Thu, Mar 3 2011 9:06 AMPermanent Link

Roy Lambert

NLH Associates

Team Elevate Team Elevate

I apologise for the horrible title

Background

Part of my filesharing app is an email subsystem. I move emails from live to an archive table after they pass a user selected age. The list of emails for each mailbox is created when a mailbox is clicked in the tree and displayed using a dbgrid (based on Mike Skolnik's). Speed is good even over the wireless parts of my LAN. The query I use is


SELECT
IF(_Private,$hgPrivate,-1) AS _xPrivacy,
IF(_aList IS NOT NULL, $hgAttachment,-1) AS _xAttachment,
IF(_fkContacts IS NOT NULL AND _fkContacts > 0,
IF(_InOutInd = $MailIn,$hgTfRMailIn,$hgTfRMailOut),
IF(_InOutInd = $MailIn,$hgMailIn,$hgMailOut)) AS _xMailDir,
CAST(RCF(_emDestination,_Destination,', ') AS VARCHAR(110)) AS _xDestination,
CAST(RCF(_emSender,_Sender,', ') AS VARCHAR(110)) AS _xSender,
CAST(RCF(_Comments,_Subject,' § ') AS VARCHAR(210)) AS _xSubject,
DTF(_Timestamp) AS _xTimestamp,
BytesToKbMb(_Size) AS _xSize,
(SELECT _Monitored FROM MailBoxes WHERE _BoxNo = :Mailbox) AND
NOT EXISTS (SELECT * FROM emReadStatus WHERE _fkEMails = _MsgNo and emReadStatus._fkStaff = $StaffID) AS _xUnRead,
EXISTS (SELECT _Blacklisted FROM emBlacklist WHERE _Blacklisted = _emSender) AS _xBlacklisted,
*
FROM $source
WHERE
_fkMailBoxes = :MailBox
$UsePrivacy AND (_fkStaff IS NULL OR _fkStaff = $StaffID OR _Private IS NULL OR _Private = FALSE)
ORDER BY _TimeStamp DESC

All the $... bits are replaced in SQL.Text when the form is opened. EG $source will become either EMails or emArchive

I also have the ability to set a filter. Since most of the emails will have an attached document (a cv) and generally its the content of these attachments that's being searched, and they are not indexed the filters can take 20-40 mins to run and are generated in a background thread which writes out filter ID and email ID to a FilterResults table.

To select the filtered emails the tree is loaded with available filters instead of mailboxes and when one of those is clicked a slightly modified version of the above query is run.

The difference is that the WHERE clause becomes

_MsgNo IN (SELECT _fkID FROM FilterResults WHERE _fkFilters = :emSearch)

This means that I get a sensitive result set based on emails or emArchive. It has two advantages

1. all of the operations that can be carried out both for "normal" email view and filtered view are the same
2. I have an extra button to allow a row to be deleted from the FilterResults table - leaves the email alone but removes from the filtered results.

The Problem

Because of the SELECT in the WHERE clause the query takes time to run. That's not really a problem but if the button to remove an email from the filtered results is clicked the query is reevaluated and that is the problem. I can live with it taking upto 60 seconds from the specific filter being selected to the display being ready but not upto 60 seconds each time someone clicks remove this one from the filter results.

I can't use a join for two reasons

1. it would produce an insensitive result set and I wouldn't be able to remove a email from the displayed results
2. because its having to drag all the data across and write out a table its a lot (think minutes) slower

I'm trying to avoid a separate form or a form with lots of "if its filtered do this otherwise do that"s but I think that's where I'll end up unless someone has a brilliant idea.

I shall now sit back and wait for the brilliant ideas to pour in!

Roy Lambert



Roy Lambert
Thu, Mar 3 2011 10:42 AMPermanent Link

John Hay

Roy

I am not sure if it would help but could you have a table (memory?) with hidden ids in it?  The delete burtton adds the
_msgno to the hidden table and the where clause checks wether  _msgno is in the hidden table.

John

Fri, Mar 4 2011 4:39 AMPermanent Link

Roy Lambert

NLH Associates

Team Elevate Team Elevate

John

>I am not sure if it would help but could you have a table (memory?) with hidden ids in it? The delete burtton adds the
>_msgno to the hidden table and the where clause checks wether _msgno is in the hidden table.

Interesting idea John. That might let me use a JOIN, combine that with creating a master-detail relationship to the Emails/emArchive table so I only have to move the visible in the grid data to the canned dataset and it might just work.

Oh what a fun weekend I'm going to have.

Roy Lambert
Fri, Mar 4 2011 10:39 AMPermanent Link

Tim Young [Elevate Software]

Elevate Software, Inc.

Avatar

Email timyoung@elevatesoft.com

Roy,

<< I also have the ability to set a filter. Since most of the emails will
have an attached document (a cv) and generally its the content of these
attachments that's being searched, and they are not indexed the filters can
take 20-40 mins to run and are generated in a background thread which writes
out filter ID and email ID to a FilterResults table. >>

Is there any particular reason that you're not using the text indexing for
this ?

--
Tim Young
Elevate Software
www.elevatesoft.com
Fri, Mar 4 2011 11:30 AMPermanent Link

Roy Lambert

NLH Associates

Team Elevate Team Elevate

Tim

>Is there any particular reason that you're not using the text indexing for
>this ?

We're talking email attachments. EMails of interest are transferred to ELN which is the table that relates to contacts the ones left in emails are the unsolicitated (non-spam) stuff. The reasons are

1. Disk space
2. Time required for indexing vs frequency of searching
3. When I first set things up I wasn't sure how to handle the attachments which could have a .pdf a .doc and a .xls all in the attachment in terms of the filter type column

No. 2 is probably the most important. I would probably want to create an intermediate file to download emails into prior to storing in the real table otherwise it would just slow things down to much.

Roy Lambert
Sun, Mar 6 2011 9:46 AMPermanent Link

Roy Lambert

NLH Associates

Team Elevate Team Elevate

John


I have things working nice and fast now. Its more complex than your idea. Just for info what I do is:

1. SELECT LIST(CAST(_fkID AS VARCHAR)) AS FList FROM FilterResults WHERE _fkFilters = :filter
2. Use the result of the above to create the query on emails. I have to "rewrite" the query each time a different filter is selected because the WHERE clause is WHERE _MsgNo IN (...) and you can't pass the in-list as a parameter
3. If I delete one of the emails or remove from the filter add _MsgNo to a stringlist and set query.filter := NOT _MsgNo IN invisible.commatext

Totally different to your suggestion Smileybut without it I would never have thought of the approach so thanks.

Roy Lambert
Mon, Mar 7 2011 10:28 AMPermanent Link

Tim Young [Elevate Software]

Elevate Software, Inc.

Avatar

Email timyoung@elevatesoft.com

Roy,

<< We're talking email attachments. EMails of interest are transferred to
ELN which is the table that relates to contacts the ones left in emails are
the unsolicitated (non-spam) stuff. The reasons are

1. Disk space
2. Time required for indexing vs frequency of searching
3. When I first set things up I wasn't sure how to handle the attachments
which could have a .pdf a .doc and a .xls all in the attachment in terms of
the filter type column >>

Well, remember that the filter type is just a "hint" to your text filtering
module.  If the text that you're filtering is in MIME format, then it will
already have content types that indicate how the various content is encoded.
IOW, you could just set the filter type column to "MIME".

<< No. 2 is probably the most important. I would probably want to create an
intermediate file to download emails into prior to storing in the real table
otherwise it would just slow things down to much. ?? >>

As long as emails are being stored once, and then only retrieved, the only
real catch is getting the block size okay.  We store all of our emails in a
DBISAM table with a BLOB block size of 512 bytes, and it would work equally
well for ElevateDB.  We literally have GBs of BLOBs stored in them - all
inbound and outbound emails with attachments.

--
Tim Young
Elevate Software
www.elevatesoft.com
Mon, Mar 7 2011 11:50 AMPermanent Link

Roy Lambert

NLH Associates

Team Elevate Team Elevate

Tim

>Well, remember that the filter type is just a "hint" to your text filtering
>module. If the text that you're filtering is in MIME format, then it will
>already have content types that indicate how the various content is encoded.
>IOW, you could just set the filter type column to "MIME".

I know that now Smileybut this was all set up when I was converting from ElevateDB to DBISAM and full text indexing was very (and I do mean very) confusing.

><< No. 2 is probably the most important. I would probably want to create an
>intermediate file to download emails into prior to storing in the real table
>otherwise it would just slow things down to much. ?? >>
>
>As long as emails are being stored once, and then only retrieved, the only
>real catch is getting the block size okay. We store all of our emails in a
>DBISAM table with a BLOB block size of 512 bytes, and it would work equally
>well for ElevateDB. We literally have GBs of BLOBs stored in them - all
>inbound and outbound emails with attachments.

I disagree, in terms of receiving emails in, checking for spam and storing to disk adding in full text indexing for attachments (I already have it in place for the message body) would substantially increase the time taken for each email. I know ElevateDB is a lot faster than DBISAM for FTI but it would impose a significant overhead.

However, a wild idea has just been sparked off - an extra column which on receipt of the email is null (or X or something) have an extra thread which checks for these rows and alters the column to M and use that column to determine wether to index attachments or not by having a text filter which simply returns and empty string.

What do you think?

Roy Lambert
Mon, Mar 7 2011 12:00 PMPermanent Link

Roy Lambert

NLH Associates

Team Elevate Team Elevate

Woops


ElevateDB to DBISAM

should read

DBISAM to ElevateDB

Roy Lambert



Sun, Mar 13 2011 5:43 PMPermanent Link

Tim Young [Elevate Software]

Elevate Software, Inc.

Avatar

Email timyoung@elevatesoft.com

Roy,

<< However, a wild idea has just been sparked off - an extra column which on
receipt of the email is null (or X or something) have an extra thread which
checks for these rows and alters the column to M and use that column to
determine wether to index attachments or not by having a text filter which
simply returns and empty string. >>

Sure, that would work great.  Having the filter return an empty string would
effectively make it invisible to the text indexing, and any modifications
that turned it back on would simply cause the text indexing to pick up the
new text like it was just added by the application or user.

--
Tim Young
Elevate Software
www.elevatesoft.com
Image