Icon View Thread

The following is the text of the current message along with any replies.
Messages 1 to 8 of 8 total
Thread Ignore punctuation
Thu, Nov 6 2014 5:00 PMPermanent Link

Beni

Hello all,

I have a request from my customer to ignore punctuation in the search. By example in the field I may have the O'brien text and the record should be returned for both O'brien and Obrien search condition. Any hint on how how to build the query?
Thu, Nov 6 2014 5:26 PMPermanent Link

Adam Brett

Orixa Systems

Beni wrote:

>>record should be returned for both O'brien and Obrien search condition.

You can do this 2 ways:

Write nested REPLACE statements,
Write a custom-function.

--
1. Nested REPLACE:

Here is 1 REPLACE statement:

REPLACE('''' WITH '' IN <search-string>)

The above removes and single quote.

REPLACE('.' WITH '' IN <search-string>)

removes full stop.

You can nest them:

REPLACE((',' WITH '' IN REPLACE('.' WITH '' IN <search-string>))

--
2 Function: The obvious thing to do is to write a function which does all the Nested Replacements you want (i.e. for ALL the punctuation) then call this:

CREATE FUNCTION RemovePunct(IN SearchTerm VARCHAR)
RETURNS VARCHAR
BEGIN
 DECLARE Result VARCHAR;
 SET Result = REPLACE(REPLACE(REPLACE .... etc.
RETURN Result
END;

Then you can write:

WHERE RemovePunct(<searchterm>) = RemovePunct(<fieldName>)
Fri, Nov 7 2014 6:34 AMPermanent Link

Beni

Thanks for the idea! Will give it a try.
Fri, Nov 7 2014 1:25 PMPermanent Link

Roy Lambert

NLH Associates

Team Elevate Team Elevate

Beni


Adam's idea is good and it will work, however, I suggest you carry out some time tests on it. I don't know how Tim has implemented REPLACE but I recently wrote a custom replace function (in Delphi) and rather than using a next of StringReplace statements walked the string and replaced stuff as needed. The net result was a function 12 times faster.

The amount of time taken for any one usage wasn't much (ie not really measurable with gettickcount) but if you're doing thousands (eg querying a large table) it adds up.

Roy Lambert
Fri, Nov 7 2014 3:06 PMPermanent Link

Beni

Roy Lambert wrote:

Beni


Adam's idea is good and it will work, however, I suggest you carry out some time tests on it. I don't know how Tim has implemented REPLACE but I recently wrote a custom replace function (in Delphi) and rather than using a next of StringReplace statements walked the string and replaced stuff as needed. The net result was a function 12 times faster.

The amount of time taken for any one usage wasn't much (ie not really measurable with gettickcount) but if you're doing thousands (eg querying a large table) it adds up.

Roy Lambert

My problem is that the punctuation may be in the field or may be in the search condition or may be in both places.
Sat, Nov 8 2014 3:14 AMPermanent Link

Roy Lambert

NLH Associates

Team Elevate Team Elevate

Beni

>My problem is that the punctuation may be in the field or may be in the search condition or may be in both places.

In that case you have two choices:

1. write a custom function to strip the punctuation from a string and use it like

WHERE NoPunctuation(field) = NoPunctuation(condition)

2. write a custom function carrying out the whole test (my prefered option) and use it like

WHERE NoPunctuation(field,condition)

You could develop the custom function in Delphi and supply as a dll or in sql itself

Roy Lambert
Mon, Nov 10 2014 3:59 AMPermanent Link

Matthew Jones

Roy Lambert wrote:
> In that case you have two choices:

I'll add a third option that has proved useful to me in the past:

3. Create another field that has the text "pre-processed" to match the
requirement without special characters. You can then search this
quickly without having to process it each time.

This is only worth doing if the cost of search is high, and done
regularly.


--

Matthew Jones
Mon, Nov 10 2014 4:22 AMPermanent Link

Roy Lambert

NLH Associates

Team Elevate Team Elevate

Matthew

>I'll add a third option that has proved useful to me in the past:
>
>3. Create another field that has the text "pre-processed" to match the
>requirement without special characters. You can then search this
>quickly without having to process it each time.
>
>This is only worth doing if the cost of search is high, and done
>regularly.

Yup and preprocess the query so the condition has punctuation removed and you get the best performance. You can even use Tim's nice GENERATED column type (I think) to manage the punctuation stripping.

There's also a fourth option if you're willing to build a word generator & Text filter - use full text indexing, strip the punctuation when building the index and the condition when building the query

Something I wonder about, and Beni hasn't said, is do his customers want the sorting of these columns to follow punctated or non-punctuated sequences?


Roy Lambert
Image