Support Forums - View Thread

View Thread

The following is the text of the current message along with any replies.

Messages 11 to 19 of 19 total

EMail Message-ID length

Wed, Jan 6 2010 7:19 PM	Permanent Link
"Adam H."	Hi Roy, Thinking outside of the box - is it possible to parse the hearder through an algorythm to come up with a shorter representation. ie, parse any email headers with a length > x through the function to come with a new shorter result which can be stored. Duplicated emails would be passed through the same function and come up with the same result leading to a duplicate flag. I'm no guru at compression, but figure that email headers are only using numeric, uppercase and '-' fields, so I guess there should be a way to compress them to use other ASCII symbols and get a shorter result. Might not be the answer you're looking for, but just an alternative idea. Cheers Adam.
Thu, Jan 7 2010 1:49 AM	Permanent Link
Roy Lambert NLH Associates Team Elevate	Uli You're taking a specific implementation (looks like GMail) and assuming that it applies everywhere. Nowhere in the RFCs have I seen any requirement for an ISP to add their own additional unique identifier. You quoted from the help file <<Unique-ID (UID for short) is a unique string assigned to each message in the mailbox. No two messages in the mailbox can have the same Unique-ID value. Sometimes Unique-ID is called GUID (globally unique identifier). Unique-ID is not associated with "Message-ID:" header of the message.>> The implication here is that the Unique-ID (which is not a standard, and may or may not exist) is within the mailbox (I assume this is not a POP box) so two different mailboxes may have the same Unique-ID. Hence to create a genuine unique id for emails I receive I'd have to add the mailbox onto the Unique-ID. I have three popboxes for business purposes, and as far as I know, non of my suppliers add their own ID, but then non of them are GMail, Yahoo mail or any of the other on-line ones. It also seems a bit weird to receive a perfectly good GUID and produce another one. The example you show looks like an MD5 hash which is good but not guaranteed to be unique. Having said that I may be doing something similar with additional checks to make sure that there are no false positives. Roy Lambert
Thu, Jan 7 2010 2:39 AM	Permanent Link
Roy Lambert NLH Associates Team Elevate	Adam Essentially that's what I'm thinking of doing - use an MD5 has which is 16 characters long and use that for the quick check (.FindKey) and if there's a hit do a string compare on the Message-ID stored in the header. Roy Lambert
Thu, Jan 7 2010 2:43 AM	Permanent Link
Uli Becker	Roy, > You're taking a specific implementation (looks like GMail) and assuming that it applies everywhere. At least all ISP I know use it (even the smaller ones). Though I admit that it doesn't to be standard. Generally I agree that this GUID cannot replace the MessageID we were talking about. Anyway it's a mystery to me that there is no standard for a unique message-id in messages. But as long as this message-id can even be empty it doesn't make sense to rely on it. As I said before the GUID I mentioned is just a good way to decide which mails have already been downloaded and which not. Nothing else. BTW I like the idea of Adam: it should be possible to "construct" a unique identifier from the message's headers if you need that. Uli
Thu, Jan 7 2010 4:48 AM	Permanent Link
Roy Lambert NLH Associates Team Elevate	Uli >Anyway it's a mystery to me that there is no standard for >a unique message-id in messages. There is - its called Message-ID Roy Lambert
Thu, Jan 7 2010 6:03 AM	Permanent Link
Uli Becker	Roy, > There is - its called Message-ID As long as a message is valid with an empty Message-ID I wouldn't call it "standard". Uli
Thu, Jan 7 2010 7:23 AM	Permanent Link
Roy Lambert NLH Associates Team Elevate	Uli Its not the standard to blame as you must know if you're written your own email app. M$ allowed OE to cope with so many deviations from the standard, and others copied them that a very large percentage of my email app's software is to cope with malformed messages. Roy Lambert
Thu, Jan 7 2010 8:55 AM	Permanent Link
"Raul"	> I only started doing this so that I could check for duplicate emails, > before that I just left it in the headers. I've had one suggestion from > the Synapse mailing list of using a MD5 hash which I'll investigate to see > what the false positive rate is like. Sounds liks a great idea. Obviously there is a chance of collisions but it'll likely be OK as i've never seen one with hashes yet (that i know of Using SHA family might be slightly preferrable over MD5 but in your case it won't matter. I assume you're storing the headers anyways so worst case you have original message id available if you really run into an issue with the hash. Raul
Thu, Jan 7 2010 10:19 AM	Permanent Link
Roy Lambert NLH Associates Team Elevate	Raul >Sounds liks a great idea. Obviously there is a chance of collisions but >it'll likely be OK as i've never seen one with hashes yet (that i know of > Using SHA family might be slightly preferrable over MD5 but in your >case it won't matter. I assume you're storing the headers anyways so worst >case you have original message id available if you really run into an issue >with the hash. MD5 has an advantage over SHA - I've already got an implementation built into Synapse so I can just use that. Roy Lambert

« Previous Page	Page 2 of 2
Jump to Page: 1 2