Login ProductsSalesSupportDownloadsAbout |
Home » Technical Support » ElevateDB Technical Support » Support Forums » ElevateDB General » View Thread |
Messages 1 to 8 of 8 total |
Stop word lists |
Thu, Nov 29 2007 3:48 AM | Permanent Link |
Roy Lambert NLH Associates Team Elevate | Tim
In your textfilter do you use a sorted stringlist for the stopwords or do you have a faster approach? If so can you share it please? Roy Lambert |
Thu, Nov 29 2007 4:56 PM | Permanent Link |
Tim Young [Elevate Software] Elevate Software, Inc. timyoung@elevatesoft.com | Roy,
<< In your textfilter do you use a sorted stringlist for the stopwords or do you have a faster approach? If so can you share it please? >> It uses a pre-sorted, constant array of strings: NUM_STOP_WORDS = 144; STOP_WORDS: array [0..NUM_STOP_WORDS-1] of TEDBString = ( 'ABOUT','ABOVE','AFAIK','ALL','ALONG','ALSO','ALTHOUGH','AND','ARE','ARENT', 'BECAUSE','BEEN','BTW','BUT','CAN','CANNOT','CANT','COULD','COULDNT','DID', 'DIDNT','DOES','DOESNT','DUH','EITHER','ETC','EVEN','EVER','FOR','FROM', 'FURTHERMORE','FYI','GET','GETS','GOT','GOTTEN','HAD','HADNT','HARDLY', 'HAS','HASNT','HAVING','HENCE','HER','HERE','HERS','HEREBY','HEREIN', 'HEREOF','HEREON','HERETO','HEREWITH','HIM','HIS','HOW','HOWEVER','IMHO','IMO', 'INTO','ISNT','ITS','LOL','MINE','NOR','NOT','ONTO','OTHER','OTOH','OUR', 'OURS','OUT','OVER','REALLY','ROTFL','SAID','SAME','SHE','SHOULD','SHOULDNT', 'SINCE','SOMEWHAT','SUCH','THAN','THAT','THATLL','THATS','THE','THEIR', 'THEIRS','THEM','THEN','THERE','THEREBY','THEREFORE','THEREFROM', 'THEREIN','THEREOF','THEREON','THERETO','THEREWITH','THESE','THEY', 'THEYLL','THEYRE','THIS','THOSE','THROUGH','THROUGHOUT','THUS','TIA','TOO', 'UNDER','UNTIL','UNTO','UPON','VERY','WAS','WASNT','WERE','WERENT','WHAT', 'WHEN','WHERE','WHEREBY','WHEREIN','WHETHER','WHICH','WHILE','WHO','WHOM', 'WHOS','WHOSE','WHY','WITH','WITHIN','WITHOUT','WONT','WOULD','WOULDNT', 'YOU','YOULL','YOUR','YOURE','YOURS'); -- Tim Young Elevate Software www.elevatesoft.com |
Fri, Nov 30 2007 4:11 AM | Permanent Link |
Roy Lambert NLH Associates Team Elevate | Tim
Not sure I can translate that to a user defined list so I'll go for a sorted stringlist. It should never be that long anyway (famous last words). Roy Lambert |
Mon, Dec 3 2007 8:03 PM | Permanent Link |
Tim Young [Elevate Software] Elevate Software, Inc. timyoung@elevatesoft.com | Roy,
<< Not sure I can translate that to a user defined list so I'll go for a sorted stringlist. It should never be that long anyway (famous last words). >> Isn't most text comprised of a smaller set of English words in the sub-20,000 word range ? If so, then the stop word list should remain pretty small over time. -- Tim Young Elevate Software www.elevatesoft.com |
Tue, Dec 4 2007 3:24 AM | Permanent Link |
Roy Lambert NLH Associates Team Elevate | Tim
>Isn't most text comprised of a smaller set of English words in the >sub-20,000 word range ? If so, then the stop word list should remain pretty >small over time. See I told you so - famous last words - what you're saying is I can grow the stringlist to 19,999 items Roy Lambert |
Tue, Dec 4 2007 9:09 AM | Permanent Link |
Roy Lambert NLH Associates Team Elevate | Tim
OK, I'm thick, how to you test if a word is in the sorted list? Roy Lambert |
Tue, Dec 4 2007 4:42 PM | Permanent Link |
Tim Young [Elevate Software] Elevate Software, Inc. timyoung@elevatesoft.com | Roy,
<< OK, I'm thick, how to you test if a word is in the sorted list? >> In edbstring.pas there's this function: function IsStopWord(Locale: Integer; const Value: TEDBString): Boolean; var CompareResult: Integer; I: Integer; Low: Integer; High: Integer; begin Result:=False; Low:=0; High:=(NUM_STOP_WORDS-1); while (Low <= High) do begin I:=((Low+High) div 2); CompareResult:=CompareStrings(Locale,STOP_WORDS[I],Value); // CompareStrings is in edbstring.pas also case CompareResult of CMP_GREATER: High:=(I-1); CMP_LESS: Low:=(I+1); CMP_EQUAL: begin Result:=True; Break; end; end; end; end; -- Tim Young Elevate Software www.elevatesoft.com |
Wed, Dec 5 2007 4:20 AM | Permanent Link |
Roy Lambert NLH Associates Team Elevate | Tim
Nice - I may steal it for something. Roy Lambert |
This web page was last updated on Tuesday, April 30, 2024 at 03:55 PM | Privacy PolicySite Map © 2024 Elevate Software, Inc. All Rights Reserved Questions or comments ? E-mail us at info@elevatesoft.com |