Icon View Thread

The following is the text of the current message along with any replies.
Messages 1 to 2 of 2 total
Thread Anyone have clever ways of getting word count of text-field data?
Tue, May 25 2021 10:40 AMPermanent Link

Adam Brett

Orixa Systems

I have CLOB data holding plain text, and I want to get back Word Count on the fields.

I can estimate (find the length of the CLOB and divide by some number for the average length of a word) but it would be great to be reasonably accurate.

I would love to do it with SQL, so I can return a dataset with the Text and the WordCount as fields. This would mean I can use it anywhere, inside a Delphi App or in an ODBC client.

If I could COMPUTE it so that the field was permanently available that would be ideal.

I don't see a SQL command in EDB to do anything like this.

... I can also foresee a situation where the users want to do the same with HTML or RTF data, it would be great to be ahead of the curve, and write something that could deal with many of the words in the document being tags or decorations.
Wed, May 26 2021 2:12 AMPermanent Link

Roy Lambert

NLH Associates

Team Elevate Team Elevate

Adam


The only way to do it accurately is to split the document into words and count them. Have a look at the my word generator module in the extensions newsgroup. It will generate a word list for you. It may need a bit of tweaking for your exact needs. As a hint if you buy MSWord from Mike Skolnik (scalabium.com) you can handle PDF, Word and other formats as well. I'd also recommend using a real column to store the count since generating word counts does take time.

Roy Lambert
Image