Icon View Thread

The following is the text of the current message along with any replies.
Messages 1 to 9 of 9 total
Thread Strip HTML tags in sql-function
Fri, Mar 22 2019 9:47 AMPermanent Link

Yusuf Zorlu

MicrotronX - Speditionssoftware vom Profi

Hi,

anyone having a ready to use function to strip HTML tags?

Yusuf Zorlu
MicrotronX
Fri, Mar 22 2019 11:15 AMPermanent Link

Roy Lambert

NLH Associates

Team Elevate Team Elevate

Yusuf


Posting in the sql ng do you want it in SQL/PSM in which case NO or is Delphi OK in which case have a look in the extensions ng for my text filters which do strip the tags.

Roy Lambert
Fri, Mar 22 2019 11:30 AMPermanent Link

Yusuf Zorlu

MicrotronX - Speditionssoftware vom Profi

Roy Lambert wrote:

<< Posting in the sql ng do you want it in SQL/PSM in which case NO or is Delphi OK in which case have a look in the extensions ng for my text filters which do strip the tags.

i want it in SQL/PSM. Have seen something in other sql-syntax but not for elevatedb.

Yusuf Zorlu
MicrotronX
Fri, Apr 5 2019 4:05 AMPermanent Link

Yusuf Zorlu

MicrotronX - Speditionssoftware vom Profi

Heres a ready to use function which works for our case:

CREATE FUNCTION "mxStripHtml" (IN "HTMLText" VARCHAR COLLATE UNI)
RETURNS VARCHAR COLLATE UNI
BEGIN   DECLARE TagBegin integer;
  DECLARE TagEnd integer;
  DECLARE TagLength integer;
  DECLARE vtest integer;
  DECLARE Result varchar;

  SET Result = HTMLText;
  SET TagBegin=position('<' in HTMLText);
  SET vtest = 0;

  while (TagBegin>0) and (vtest<1000) DO --max 1000 loops to be sure
     SET TagEnd = position ('>' in Result);
     SET TagLength = TagEnd - TagBegin +1;
     if TagBegin=1 then
        SET Result = SUBSTRING(Result, TagBegin+TagLength, LENGTH(Result));
     end if;

     if TagBegin>1 then
        SET Result = SUBSTRING(Result, 1, TagBegin-1)+SUBSTRING(Result, TagBegin+TagLength, LENGTH(Result));
     end if;
  
     SET TagBegin = position('<' in result);
     SET vtest = vtest +1;
  end while;
  
  Return Result;
  
END
VERSION 1.00

Yusuf Zorlu
MicrotronX
Fri, Apr 5 2019 4:56 AMPermanent Link

Yusuf Zorlu

MicrotronX - Speditionssoftware vom Profi

Here's the part for removing complete tags like <head... xxx </head>

CREATE FUNCTION "mxRemoveHTMLFullTags" (
IN "HTMLText" VARCHAR COLLATE "UNI",
IN "HTMLTag" VARCHAR COLLATE "UNI")
RETURNS VARCHAR COLLATE "UNI"
BEGIN
  DECLARE TagBegin integer;
  DECLARE TagEnd integer;
  DECLARE TagLength integer;
  DECLARE Result varchar;
  DECLARE vTest integer;
  SET Result = HTMLText;

  SET TagBegin=position('<'+lower(htmltag) in lower(Result));
  SET vtest = 0;

  while (TagBegin>0) and (vtest<10) DO
     SET TagEnd = position ('</'+lower(htmltag)+'>' in Result);
     SET TagLength = TagEnd - TagBegin +length(htmltag)+3;
     if TagBegin=1 then
        SET Result = SUBSTRING(Result, TagBegin+TagLength, LENGTH(Result));
     end if;

     if TagBegin>1 then
        SET Result = SUBSTRING(Result, 1, TagBegin-1)+SUBSTRING(Result, TagBegin+TagLength, LENGTH(Result));
     end if;
                                                                  
     SET TagBegin=position('<'+lower(htmltag) in lower(Result));
     SET vtest = vtest +1;
  end while;
  
  Return Result;
  
END
VERSION 1.00



and now the new mxStripHtml:


CREATE FUNCTION "mxStripHtml" (IN "HTMLText" VARCHAR COLLATE "UNI")
RETURNS VARCHAR COLLATE "UNI"
BEGIN   DECLARE TagBegin integer;
  DECLARE TagEnd integer;
  DECLARE TagLength integer;
  DECLARE vtest integer;
  DECLARE Result varchar;

  SET Result = HTMLText;

  set Result = mxRemoveHTMLFullTags(Result, 'html');   
  set Result = mxRemoveHTMLFullTags(Result, 'script');

  SET TagBegin=position('<' in Result);
  SET vtest = 0;

  while (TagBegin>0) and (vtest<10) DO
     SET TagEnd = position ('>' in Result);
     SET TagLength = TagEnd - TagBegin +1;
     if TagBegin=1 then
        SET Result = SUBSTRING(Result, TagBegin+TagLength, LENGTH(Result));
     end if;

     if TagBegin>1 then
        SET Result = SUBSTRING(Result, 1, TagBegin-1)+SUBSTRING(Result, TagBegin+TagLength, LENGTH(Result));
     end if;
  
     SET TagBegin = position('<' in result);
     SET vtest = vtest +1;
  end while;
  
  Return Result;
  
END
VERSION 1.00


But to complete things we need to

1. replace some characters and add a NewLine i.e. make <p to <p+NewLine, and also <tr to <tr+NewLine etc. to make the result more readable.

Hope this helps someone.

Yusuf Zorlu
MicrotronX
Sat, Apr 6 2019 6:20 AMPermanent Link

Yusuf Zorlu

MicrotronX - Speditionssoftware vom Profi

if the text it only some lines this works, if the CLOB has a lot of data > 2kb it seems that the SUBSTRING or SET has a problem and i get an empty result.

Yusuf Zorlu
MicrotronX
Sat, Apr 6 2019 6:58 AMPermanent Link

Roy Lambert

NLH Associates

Team Elevate Team Elevate

Yusuf


It may be that its the maximum size of a VARCHAR (the documentation says 1024 bytes) I think that only applies to field sizes though.

Looking at your function all the parameters are VARCHAR, but your post says CLOB so how / where are you casting it?

Its probably worth creating a small test case and sending to Tim.

Roy Lambert
Mon, Apr 8 2019 3:35 AMPermanent Link

Yusuf Zorlu

MicrotronX - Speditionssoftware vom Profi

Roy Lambert wrote:

<< Looking at your function all the parameters are VARCHAR, but your post says CLOB so how / where are you casting it?

yes it was defined as varchar in first step, now it is CLOB but does not work also. I'll create a test-case for Tim.

Yusuf Zorlu
MicrotronX
Mon, Apr 8 2019 8:14 AMPermanent Link

Roy Lambert

NLH Associates

Team Elevate Team Elevate

Yusuf


Just done a bit of playing about and I'm pretty certain its using VARCHAR that's the problem. I was getting NULLs out so altered
it to

ALTER FUNCTION "mxStripHtml" (IN "HTMLText" CLOB COLLATE "ANSI")
RETURNS CLOB COLLATE "ANSI"
BEGIN   DECLARE TagBegin integer;
  DECLARE TagEnd integer;
  DECLARE TagLength integer;
  DECLARE vtest integer;
  DECLARE Result CLOB;

  SET Result = HTMLText;

--   set Result = mxRemoveHTMLFullTags(Result, 'html');
--   set Result = mxRemoveHTMLFullTags(Result, 'script');

  SET TagBegin=position('<' in Result);
  SET vtest = 0;

  while (TagBegin>0) and (vtest<10) DO
     SET TagEnd = position ('>' in Result);
     SET TagLength = TagEnd - TagBegin +1;
     if TagBegin=1 then
        SET Result = SUBSTRING(Result, TagBegin+TagLength, LENGTH(Result));
     end if;

     if TagBegin>1 then
        SET Result = SUBSTRING(Result, 1, TagBegin-1)+SUBSTRING(Result, TagBegin+TagLength, LENGTH(Result));
     end if;
 
     SET TagBegin = position('<' in result);
     SET vtest = vtest +1;
  end while;

  Return Result;
 
END
VERSION 1.00


and I get an output - haven't checked it for validity but at least I get an output.

Unless there are really good grounds for sticking with SQL I'd look at using an external function written in Delphi.

Roy Lambert
Image