Login ProductsSalesSupportDownloadsAbout |
Home » Technical Support » ElevateDB Technical Support » Support Forums » ElevateDB General » View Thread |
Messages 11 to 19 of 19 total |
Full text indexing |
Wed, Aug 30 2006 4:35 PM | Permanent Link |
Tim Young [Elevate Software] Elevate Software, Inc. timyoung@elevatesoft.com | Michael,
<< I'm sure you've done this already, but just in case.... In the caller code, make sure to raise error if word generator didn't increment Position by at least 1. Otherwise you might end up with infinite loop, just because someone has made a silly mistake in their code I can't tell you how many times I've done just that . >> Actually, no, I hadn't done that yet since it really falls under the general category of "oops, I screwed up". In addition, I'm not sure if I want to necessarily dictate that every call to the word generator must increment the text position. If someone is doing stemming or something similar, they may or may not increment the text position on every call. -- Tim Young Elevate Software www.elevatesoft.com |
Wed, Aug 30 2006 4:39 PM | Permanent Link |
Tim Young [Elevate Software] Elevate Software, Inc. timyoung@elevatesoft.com | Roy,
<< In this case all I was objecting to was the word "default" not the concept or your non-supply of every possible variation I can think of in deciding which words to index. >> Well, default was used for better lack of a word. It is the default behavior, hence I picked "default". << However, since you've given more detail, and even though its really to early to ask details. Is EDBWordGeneratorModuleGenerateWord going to be called once per word as I'm guessing? >> Yes, but once per actual word that will be indexed, not once per word in the text. << Finally reading ".DLL" - shudder >> Uggh, darn religious people.... -- Tim Young Elevate Software www.elevatesoft.com |
Wed, Aug 30 2006 6:49 PM | Permanent Link |
Michael Baytalsky | Tim,
> text position. If someone is doing stemming or something similar, they may > or may not increment the text position on every call. I'm not convinced, although I'm not sure I understand the word "stemming". If you need to perform two steps, you can certainly do so at once, there's no reason to have it invoke the same procedure twice with the same parameters. It's just going to be the most frequent error, like "oops, I didn't realize I had to increment the counter". Just my 2c. Michael |
Thu, Aug 31 2006 4:32 AM | Permanent Link |
Roy Lambert NLH Associates Team Elevate | Tim
><< However, since you've given more detail, and even though its really to >early to ask details. Is EDBWordGeneratorModuleGenerateWord going to be >called once per word as I'm guessing? >> > >Yes, but once per actual word that will be indexed, not once per word in the >text. > This could present problems. I don't know about anyone else but when I'm stripping codes from html I simply zap the whole of the style definition ie between <style> and </style>. I hope there is another module or something that allows the text as a who;e to be processed as at present? Roy Lambert |
Thu, Aug 31 2006 8:55 AM | Permanent Link |
Dan Rootham | Michael,
<< I'm not sure I understand the word "stemming" >> If you have an inflected language, it's useful to isolate the stem of a noun or verb and use that for the search. For example: remove removes removing removed removal remover Here the stem would be "remov", and from this stem you can build any inflected form of the word. Stemming becomes really important with highly inflected languages with different gender endings (masculine, feminine, neuter), different case endings (nominative, accusative, genitive, dative) and many different verb endings both for tense (present, future, past) and for person (I, you, he, she, we, they). And FWIW, stemming is the basis of all "real" machine translation systems. HTH, Dan Lexicon Software Ltd, Bath, UK |
Thu, Aug 31 2006 11:42 AM | Permanent Link |
Michael Baytalsky | Dan,
> If you have an inflected language, it's useful to isolate the stem of a noun Very informative, I didn't know that, thanks! It's still not quite clear to me, why would one need to have double run on the same portion of text to do stemming (initial Tim's argument)? But I guess it's not really important - I just wanted to bring Tim's attention to possible problem, at which point it's totally up to him figuring out what to do or not do about it Cheers, Michael Dan Rootham wrote: > Michael, > > << I'm not sure I understand the word "stemming" >> > > If you have an inflected language, it's useful to isolate the stem of a noun > or verb and use that for the search. For example: > remove > removes > removing > removed > removal > remover > > Here the stem would be "remov", and from this stem you can build any > inflected form of the word. > > Stemming becomes really important with highly inflected languages > with different gender endings (masculine, feminine, neuter), > different case endings (nominative, accusative, genitive, dative) and > many different verb endings both for tense (present, future, past) and > for person (I, you, he, she, we, they). > > And FWIW, stemming is the basis of all "real" machine translation systems. > > HTH, > Dan > > Lexicon Software Ltd, Bath, UK > |
Thu, Aug 31 2006 6:04 PM | Permanent Link |
Tim Young [Elevate Software] Elevate Software, Inc. timyoung@elevatesoft.com | Roy,
<< This could present problems. I don't know about anyone else but when I'm stripping codes from html I simply zap the whole of the style definition ie between <style> and </style>. >> You're getting word generation mixed up with text filtering. HTML code stripping would occur in a text filter, which is a different, but similar, type of external module that is called once for the entire BLOB contents. -- Tim Young Elevate Software www.elevatesoft.com |
Fri, Sep 1 2006 3:41 AM | Permanent Link |
Roy Lambert NLH Associates Team Elevate | Tim
Phew Roy Lambert ps I wasn't getting confused, just wanted to make sure the other approach (which I currently use) was still there - roll on ElevateDB (beta or whatever). |
Fri, Sep 1 2006 5:12 PM | Permanent Link |
Tim Young [Elevate Software] Elevate Software, Inc. timyoung@elevatesoft.com | Roy,
<< ps I wasn't getting confused, just wanted to make sure the other approach (which I currently use) was still there - roll on ElevateDB (beta or whatever). >> No problem. -- Tim Young Elevate Software www.elevatesoft.com |
« Previous Page | Page 2 of 2 | |
Jump to Page: 1 2 |
This web page was last updated on Tuesday, May 7, 2024 at 06:25 PM | Privacy PolicySite Map © 2024 Elevate Software, Inc. All Rights Reserved Questions or comments ? E-mail us at info@elevatesoft.com |