Login ProductsSalesSupportDownloadsAbout |
Home » Technical Support » ElevateDB Technical Support » Support Forums » ElevateDB General » View Thread |
Messages 1 to 10 of 35 total |
TextIndex CLOB Problem |
Wed, Oct 28 2009 9:56 AM | Permanent Link |
Igor Colovic | I have downloaded Trial version of EDB 2.03 (NoUnicode) and have some problems with TextIndex.
The situation is like this: Table Documents: ID Integer Document CLOB DocType CHAR(4) I want to create TextIndex on this table. Document can be one of this types (HTML, RTF, RVF - our internal format). I have created TextFilterModule for RVF conversion to plain text. The problem is that TextToFilter dose not contains all data from Document field. It is trimmed in first #0 character. This was not a problem with DBISAM (witch I would like to replace with EDB). You can replicate this by setting value of CLOB filed to something with #0 in the middle of text. Document.Value := 'BROW'#0'FOX'; The content of field will be displayed correctly but TextIndex will only have BROWN in index. Best regards Igor Colovic |
Wed, Oct 28 2009 10:49 AM | Permanent Link |
Roy Lambert NLH Associates Team Elevate | Igor
Whilst I don't know for certain my bet is that with DBISAM as parameters to the procedure call you had (Sender: TObject; const TableName, FieldName: string; var TextToIndex: string); in ElevateDB you have (const FilterType: string; const TextToFilter: string; var FilteredText: string) #0 is the string delimiter so its never even getting in there. I think you're going to have to replace #0 with something else. Roy Lambert [Team Elevate] |
Wed, Oct 28 2009 11:19 AM | Permanent Link |
Tim Young [Elevate Software] Elevate Software, Inc. timyoung@elevatesoft.com | Igor,
<< I have created TextFilterModule for RVF conversion to plain text. The problem is that TextToFilter dose not contains all data from Document field. It is trimmed in first #0 character. >> You cannot embed #0 characters in CLOB columns and have the text indexing work correctly with an external word generator or text filter DLL. The text filter DLL uses pAnsiChar and pWideChar types to transfer data back and forth, which means that any #0 characters will serve to truncate the text. You should use something like tab (#9) characters or some other control character below #32 to delimit your text. -- Tim Young Elevate Software www.elevatesoft.com |
Wed, Oct 28 2009 11:23 AM | Permanent Link |
Igor Colovic | I can not replace #0 with anything. It is from another component and we use it as our
internal format for documents. If your statement is correct why can I display data from CLOB correctly in application. Can you create another version of TEDBTextFilterModule witch will use buffer(TEDBBufferFilterModule)? I think that would clear this problem. Roy Lambert wrote: Igor .... (const FilterType: string; const TextToFilter: string; var FilteredText: string) #0 is the string delimiter so its never even getting in there. I think you're going to have to replace #0 with something else. Roy Lambert [Team Elevate] |
Wed, Oct 28 2009 11:53 AM | Permanent Link |
Roy Lambert NLH Associates Team Elevate | Igor
>I can not replace #0 with anything. It is from another component and we use it as our >internal format for documents. I have no idea if it will work but what about setting up a calculated field, use REPLACE(field,#0,#8) and index that? >If your statement is correct why can I display data from CLOB correctly in application. Loading a CLOB into a DBMemo or some such is a different operation to passing strings to be edited. >Can you create another version of TEDBTextFilterModule witch will use >buffer(TEDBBufferFilterModule)? >I think that would clear this problem. I can't (I'm just another user), Tim might be able to but that's his decision. Roy Lambert [Team Elevate] |
Wed, Oct 28 2009 12:35 PM | Permanent Link |
Roy Lambert NLH Associates Team Elevate | Igor
Following on from my suggestion I just tried ALTER TABLE "EMails" ADD COLUMN "DUMP" CLOB COLLATE "ANSI" COMPUTED ALWAYS AS REPLACE(#0,#8,_Message) CREATE TEXT INDEX "dumping" ON "EMails" ("DUMP" COLLATE "ANSI_CI") INDEXED WORD LENGTH 30 WORD GENERATOR "Default" and it works fine. I think I'm right in that COMPUTED columns aren't stored (GENERATED ones are) so there's no impact on disk space and I'd guess not to much on performance. Roy Lambert |
Wed, Oct 28 2009 1:53 PM | Permanent Link |
Tim Young [Elevate Software] Elevate Software, Inc. timyoung@elevatesoft.com | Igor,
<< If your statement is correct why can I display data from CLOB correctly in application. >> Because we use normal strings (AnsiString/WideString/UnicodeString) in EDB, and what you're talking about is an external DLL. The only way to get strings to and from a DLL with reasonable performance is to use null-terminated pAnsiChar/pWideChar/pUnicodeChar strings. << Can you create another version of TEDBTextFilterModule witch will use buffer(TEDBBufferFilterModule)? >> Maybe, but I'd have to look into it. Another option for you might be for us to simply surface an event handler in the TEDBSession component instead, which will remove the necessity of using the external DLL. However, both of these are things that will have to wait until 2.04 because they will involve configuration changes. -- Tim Young Elevate Software www.elevatesoft.com |
Thu, Oct 29 2009 4:40 AM | Permanent Link |
Igor Colovic | "Tim Young [Elevate Software]" wrote:
... <<Maybe, but I'd have to look into it. Another option for you might be for us to simply surface an event handler in the TEDBSession component instead, which will remove the necessity of using the external DLL. However, both of these are things that will have to wait until 2.04 because they will involve configuration changes.>> That would be very nice. Either solution will fit my needs. P.S. Can You tell me the time frame for 2.04 Best regards Igor Colovic |
Thu, Oct 29 2009 5:05 AM | Permanent Link |
Roy Lambert NLH Associates Team Elevate | Tim
>Maybe, but I'd have to look into it. Another option for you might be for us >to simply surface an event handler in the TEDBSession component instead, >which will remove the necessity of using the external DLL. However, both of >these are things that will have to wait until 2.04 because they will involve >configuration changes. That sounds interesting but wouldn't we then be back to the DBSys days of having to recompile DBSys to get the code in? What's your views on my idea of a COMPUTED column? Roy Lambert |
Thu, Oct 29 2009 6:14 AM | Permanent Link |
Igor Colovic | Roy Lambert wrote:
What's your views on my idea of a COMPUTED column? Roy COMPUTED column is not an option. This #0 chars are path of internal document format (header). Text in document is saved as MultiByte string. Document can contain pictures. So changing all #0 to #8 will allow for data to be passed to TextFilterModule. In this situation I will have to change all #8 back to #0 before I could get plain text from document. And this is fine if documents are small. But my documents are 10-40MB and there are ~30000(and number is growing) of them. This would be bottleneck in text indexing. Best regards Igor Colovic |
Page 1 of 4 | Next Page » | |
Jump to Page: 1 2 3 4 |
This web page was last updated on Friday, May 3, 2024 at 06:06 PM | Privacy PolicySite Map © 2024 Elevate Software, Inc. All Rights Reserved Questions or comments ? E-mail us at info@elevatesoft.com |