Support Forums - View Thread

View Thread

The following is the text of the current message along with any replies.

Messages 11 to 19 of 19 total

Full text indexing

Wed, Aug 30 2006 4:35 PM	Permanent Link
Tim Young [Elevate Software] Elevate Software, Inc. timyoung@elevatesoft.com	Michael, << I'm sure you've done this already, but just in case.... In the caller code, make sure to raise error if word generator didn't increment Position by at least 1. Otherwise you might end up with infinite loop, just because someone has made a silly mistake in their code I can't tell you how many times I've done just that . >> Actually, no, I hadn't done that yet since it really falls under the general category of "oops, I screwed up". In addition, I'm not sure if I want to necessarily dictate that every call to the word generator must increment the text position. If someone is doing stemming or something similar, they may or may not increment the text position on every call. -- Tim Young Elevate Software www.elevatesoft.com
Wed, Aug 30 2006 4:39 PM	Permanent Link
Tim Young [Elevate Software] Elevate Software, Inc. timyoung@elevatesoft.com	Roy, << In this case all I was objecting to was the word "default" not the concept or your non-supply of every possible variation I can think of in deciding which words to index. >> Well, default was used for better lack of a word. It is the default behavior, hence I picked "default". << However, since you've given more detail, and even though its really to early to ask details. Is EDBWordGeneratorModuleGenerateWord going to be called once per word as I'm guessing? >> Yes, but once per actual word that will be indexed, not once per word in the text. << Finally reading ".DLL" - shudder >> Uggh, darn religious people.... -- Tim Young Elevate Software www.elevatesoft.com
Wed, Aug 30 2006 6:49 PM	Permanent Link
Michael Baytalsky	Tim, > text position. If someone is doing stemming or something similar, they may > or may not increment the text position on every call. I'm not convinced, although I'm not sure I understand the word "stemming". If you need to perform two steps, you can certainly do so at once, there's no reason to have it invoke the same procedure twice with the same parameters. It's just going to be the most frequent error, like "oops, I didn't realize I had to increment the counter". Just my 2c. Michael
Thu, Aug 31 2006 4:32 AM	Permanent Link
Roy Lambert NLH Associates Team Elevate	Tim ><< However, since you've given more detail, and even though its really to >early to ask details. Is EDBWordGeneratorModuleGenerateWord going to be >called once per word as I'm guessing? >> > >Yes, but once per actual word that will be indexed, not once per word in the >text. > This could present problems. I don't know about anyone else but when I'm stripping codes from html I simply zap the whole of the style definition ie between <style> and </style>. I hope there is another module or something that allows the text as a who;e to be processed as at present? Roy Lambert
Thu, Aug 31 2006 8:55 AM	Permanent Link
Dan Rootham	Michael, << I'm not sure I understand the word "stemming" >> If you have an inflected language, it's useful to isolate the stem of a noun or verb and use that for the search. For example: remove removes removing removed removal remover Here the stem would be "remov", and from this stem you can build any inflected form of the word. Stemming becomes really important with highly inflected languages with different gender endings (masculine, feminine, neuter), different case endings (nominative, accusative, genitive, dative) and many different verb endings both for tense (present, future, past) and for person (I, you, he, she, we, they). And FWIW, stemming is the basis of all "real" machine translation systems. HTH, Dan Lexicon Software Ltd, Bath, UK
Thu, Aug 31 2006 11:42 AM	Permanent Link
Michael Baytalsky	Dan, > If you have an inflected language, it's useful to isolate the stem of a noun Very informative, I didn't know that, thanks! It's still not quite clear to me, why would one need to have double run on the same portion of text to do stemming (initial Tim's argument)? But I guess it's not really important - I just wanted to bring Tim's attention to possible problem, at which point it's totally up to him figuring out what to do or not do about it Cheers, Michael Dan Rootham wrote: > Michael, > > << I'm not sure I understand the word "stemming" >> > > If you have an inflected language, it's useful to isolate the stem of a noun > or verb and use that for the search. For example: > remove > removes > removing > removed > removal > remover > > Here the stem would be "remov", and from this stem you can build any > inflected form of the word. > > Stemming becomes really important with highly inflected languages > with different gender endings (masculine, feminine, neuter), > different case endings (nominative, accusative, genitive, dative) and > many different verb endings both for tense (present, future, past) and > for person (I, you, he, she, we, they). > > And FWIW, stemming is the basis of all "real" machine translation systems. > > HTH, > Dan > > Lexicon Software Ltd, Bath, UK >
Thu, Aug 31 2006 6:04 PM	Permanent Link
Tim Young [Elevate Software] Elevate Software, Inc. timyoung@elevatesoft.com	Roy, << This could present problems. I don't know about anyone else but when I'm stripping codes from html I simply zap the whole of the style definition ie between <style> and </style>. >> You're getting word generation mixed up with text filtering. HTML code stripping would occur in a text filter, which is a different, but similar, type of external module that is called once for the entire BLOB contents. -- Tim Young Elevate Software www.elevatesoft.com
Fri, Sep 1 2006 3:41 AM	Permanent Link
Roy Lambert NLH Associates Team Elevate	Tim Phew Roy Lambert ps I wasn't getting confused, just wanted to make sure the other approach (which I currently use) was still there - roll on ElevateDB (beta or whatever).
Fri, Sep 1 2006 5:12 PM	Permanent Link
Tim Young [Elevate Software] Elevate Software, Inc. timyoung@elevatesoft.com	Roy, << ps I wasn't getting confused, just wanted to make sure the other approach (which I currently use) was still there - roll on ElevateDB (beta or whatever). >> No problem. -- Tim Young Elevate Software www.elevatesoft.com

« Previous Page	Page 2 of 2
Jump to Page: 1 2