|Home » Technical Support » DBISAM Technical Support » Support Forums » DBISAM General Discussion » View Thread|
|Messages 11 to 19 of 19 total|
|EMail Message-ID length|
|Wed, Jan 6 2010 7:19 PM||Permanent Link|
Thinking outside of the box - is it possible to parse the hearder
through an algorythm to come up with a shorter representation.
ie, parse any email headers with a length > x through the function to
come with a new shorter result which can be stored. Duplicated emails
would be passed through the same function and come up with the same
result leading to a duplicate flag.
I'm no guru at compression, but figure that email headers are only using
numeric, uppercase and '-' fields, so I guess there should be a way to
compress them to use other ASCII symbols and get a shorter result.
Might not be the answer you're looking for, but just an alternative idea.
|Thu, Jan 7 2010 1:49 AM||Permanent Link|
You're taking a specific implementation (looks like GMail) and assuming that it applies everywhere. Nowhere in the RFCs have I seen any requirement for an ISP to add their own additional unique identifier.
You quoted from the help file
<<Unique-ID (UID for short) is a unique string assigned to each message in
the mailbox. No two messages in the mailbox can have the same Unique-ID
value. Sometimes Unique-ID is called GUID (globally unique identifier).
Unique-ID is not associated with "Message-ID:" header of the message.>>
The implication here is that the Unique-ID (which is not a standard, and may or may not exist) is within the mailbox (I assume this is not a POP box) so two different mailboxes may have the same Unique-ID. Hence to create a genuine unique id for emails I receive I'd have to add the mailbox onto the Unique-ID.
I have three popboxes for business purposes, and as far as I know, non of my suppliers add their own ID, but then non of them are GMail, Yahoo mail or any of the other on-line ones.
It also seems a bit weird to receive a perfectly good GUID and produce another one. The example you show looks like an MD5 hash which is good but not guaranteed to be unique.
Having said that I may be doing something similar with additional checks to make sure that there are no false positives.
|Thu, Jan 7 2010 2:39 AM||Permanent Link|
Essentially that's what I'm thinking of doing - use an MD5 has which is 16 characters long and use that for the quick check (.FindKey) and if there's a hit do a string compare on the Message-ID stored in the header.
|Thu, Jan 7 2010 2:43 AM||Permanent Link|
> You're taking a specific implementation (looks like GMail) and assuming that it applies everywhere.
At least all ISP I know use it (even the smaller ones). Though I admit
that it doesn't to be standard.
Generally I agree that this GUID cannot replace the MessageID we were
talking about. Anyway it's a mystery to me that there is no standard for
a unique message-id in messages.
But as long as this message-id can even be empty it doesn't make sense
to rely on it.
As I said before the GUID I mentioned is just a good way to decide which
mails have already been downloaded and which not. Nothing else.
BTW I like the idea of Adam: it should be possible to "construct" a
unique identifier from the message's headers if you need that.
|Thu, Jan 7 2010 4:48 AM||Permanent Link|
>Anyway it's a mystery to me that there is no standard for
>a unique message-id in messages.
There is - its called Message-ID
|Thu, Jan 7 2010 6:03 AM||Permanent Link|
> There is - its called Message-ID
As long as a message is valid with an empty Message-ID I wouldn't call
|Thu, Jan 7 2010 7:23 AM||Permanent Link|
Its not the standard to blame as you must know if you're written your own email app. M$ allowed OE to cope with so many deviations from the standard, and others copied them that a very large percentage of my email app's software is to cope with malformed messages.
|Thu, Jan 7 2010 8:55 AM||Permanent Link|
> I only started doing this so that I could check for duplicate emails,
> before that I just left it in the headers. I've had one suggestion from
> the Synapse mailing list of using a MD5 hash which I'll investigate to see
> what the false positive rate is like.
Sounds liks a great idea. Obviously there is a chance of collisions but
it'll likely be OK as i've never seen one with hashes yet (that i know of
Using SHA family might be slightly preferrable over MD5 but in your
case it won't matter. I assume you're storing the headers anyways so worst
case you have original message id available if you really run into an issue
with the hash.
|Thu, Jan 7 2010 10:19 AM||Permanent Link|
>Sounds liks a great idea. Obviously there is a chance of collisions but
>it'll likely be OK as i've never seen one with hashes yet (that i know of
> Using SHA family might be slightly preferrable over MD5 but in your
>case it won't matter. I assume you're storing the headers anyways so worst
>case you have original message id available if you really run into an issue
>with the hash.
MD5 has an advantage over SHA - I've already got an implementation built into Synapse so I can just use that.
|« Previous Page||Page 2 of 2|
|Jump to Page: 1 2|