Icon View Thread

The following is the text of the current message along with any replies.
Messages 11 to 19 of 19 total
Thread EMail Message-ID length
Wed, Jan 6 2010 7:19 PMPermanent Link

"Adam H."
Hi Roy,

Thinking outside of the box - is it possible to parse the hearder
through an algorythm to come up with a shorter representation.

ie, parse any email headers with a length > x through the function to
come with a new shorter result which can be stored. Duplicated emails
would be passed through the same function and come up with the same
result leading to a duplicate flag.

I'm no guru at compression, but figure that email headers are only using
numeric, uppercase and '-' fields, so I guess there should be a way to
compress them to use other ASCII symbols and get a shorter result.

Might not be the answer you're looking for, but just an alternative idea.

Cheers

Adam.
Thu, Jan 7 2010 1:49 AMPermanent Link

Roy Lambert

NLH Associates

Team Elevate Team Elevate

Uli

You're taking a specific implementation (looks like GMail) and assuming that it applies everywhere. Nowhere in the RFCs have I seen any requirement for an ISP to add their own additional unique identifier.


You quoted from the help file

<<Unique-ID (UID for short) is a unique string assigned to each message in
the mailbox. No two messages in the mailbox can have the same Unique-ID
value. Sometimes Unique-ID is called GUID (globally unique identifier).
Unique-ID is not associated with "Message-ID:" header of the message.>>

The implication here is that the Unique-ID (which is not a standard, and may or may not exist) is within the mailbox (I assume this is not a POP box) so two different mailboxes may have the same Unique-ID. Hence to create a genuine unique id for emails I receive I'd have to add the mailbox onto the Unique-ID.

I have three popboxes for business purposes, and as far as I know, non of my suppliers add their own ID, but then non of them are GMail, Yahoo mail or any of the other on-line ones.

It also seems a bit weird to receive a perfectly good GUID and produce another one. The example you show looks like an MD5 hash which is good but not guaranteed to be unique.

Having said that I may be doing something similar with additional checks to make sure that there are no false positives.

Roy Lambert
Thu, Jan 7 2010 2:39 AMPermanent Link

Roy Lambert

NLH Associates

Team Elevate Team Elevate

Adam


Essentially that's what I'm thinking of doing - use an MD5 has which is 16 characters long and use that for the quick check (.FindKey) and if there's a hit do a string compare on the Message-ID stored in the header.

Roy Lambert
Thu, Jan 7 2010 2:43 AMPermanent Link

Uli Becker
Roy,

> You're taking a specific implementation (looks like GMail) and assuming that it applies everywhere.

At least all ISP I know use it (even the smaller ones). Though I admit
that it doesn't to be standard.
Generally I agree that this GUID cannot replace the MessageID we were
talking about. Anyway it's a mystery to me that there is no standard for
a unique message-id in messages.
But as long as this message-id can even be empty it doesn't make sense
to rely on it.

As I said before the GUID I mentioned is just a good way to decide which
mails have already been downloaded and which not. Nothing else.

BTW I like the idea of Adam: it should be possible to "construct" a
unique identifier from the message's headers if you need that.

Uli
Thu, Jan 7 2010 4:48 AMPermanent Link

Roy Lambert

NLH Associates

Team Elevate Team Elevate

Uli

>Anyway it's a mystery to me that there is no standard for
>a unique message-id in messages.

There is - its called Message-ID Smiley

Roy Lambert
Thu, Jan 7 2010 6:03 AMPermanent Link

Uli Becker
Roy,

> There is - its called Message-ID Smiley

As long as a message is valid with an empty Message-ID I wouldn't call
it "standard".

Uli
Thu, Jan 7 2010 7:23 AMPermanent Link

Roy Lambert

NLH Associates

Team Elevate Team Elevate

Uli


Its not the standard to blame as you must know if you're written your own email app. M$ allowed OE to cope with so many deviations from the standard, and others copied them that a very large percentage of my email app's software is to cope with malformed messages.

Roy Lambert
Thu, Jan 7 2010 8:55 AMPermanent Link

"Raul"

> I only started doing this so that I could check for duplicate emails,
> before that I just left it in the headers. I've had one suggestion from
> the Synapse mailing list of using a MD5 hash which I'll investigate to see
> what the false positive rate is like.

Sounds liks a great idea. Obviously there is a chance of collisions but
it'll likely be OK as i've never seen one with hashes yet (that i know of
Smile Using SHA family might be slightly preferrable over MD5 but in your
case it won't matter. I assume you're storing the headers anyways so worst
case you have original message id available if you really run into an issue
with the hash.

Raul

Thu, Jan 7 2010 10:19 AMPermanent Link

Roy Lambert

NLH Associates

Team Elevate Team Elevate

Raul

>Sounds liks a great idea. Obviously there is a chance of collisions but
>it'll likely be OK as i've never seen one with hashes yet (that i know of
>Smile Using SHA family might be slightly preferrable over MD5 but in your
>case it won't matter. I assume you're storing the headers anyways so worst
>case you have original message id available if you really run into an issue
>with the hash.

MD5 has an advantage over SHA - I've already got an implementation built into Synapse so I can just use that.

Roy Lambert
« Previous PagePage 2 of 2
Jump to Page:  1 2
Image