Login ProductsSalesSupportDownloadsAbout |
Home Technical Support ElevateDB Technical Support Support Forums ElevateDB SQL View Thread |
Messages 1 to 10 of 10 total |
UPPER() does not convert cyrillic text |
Wed, Dec 20 2017 11:06 AM | Permanent Link |
Malcolm Taylor | I just noticed that when I use the SQL UPPER() function it appears to
bail out with Cyrillic text. Googling sort of explains it, but in pure Delphi code I do get the conversion. I am not much bothered about it, now that I see it is not my bug, but is this ever likely to be addressed in EDB? |
Fri, Dec 22 2017 1:50 PM | Permanent Link |
Tim Young [Elevate Software] Elevate Software, Inc. timyoung@elevatesoft.com | Malcolm,
<< I just noticed that when I use the SQL UPPER() function it appears to bail out with Cyrillic text. >> In what context is the UPPER() function being used ? In other words, what is being upper-cased ? UPPER() and LOWER() use the collation of the target expression for determining how the transformation should take place: https://www.elevatesoft.com/articles?action=view&category=edb&article=collations_comparisons So, you could force a specific collation by using something like this: UPPER(MyColumn) becomes UPPER(MyColumn COLLATE RUS) Or, even easier, if you're simply trying to do case-insensitive comparisons, you could just simply do this: MyColumn COLLATE RUS_CI = 'Text' and EDB will try to find an index on the MyColumn column with a collation of RUS_CI in order to optimize the condition. Tim Young Elevate Software www.elevatesoft.com |
Fri, Dec 22 2017 6:08 PM | Permanent Link |
Malcolm Taylor | Tim
I am not using it for comparisons or collations, but just for presentation. I am using UNICODE and the columns in question are peoples' names. In Europe at least, there are differing conventions for displaying names so I store names with initial capitals (as best I can) then allow users to choose one of the 6 common variations for display purposes. So if they choose the "FAMILYNAME, Firstname" format I simply .. SELECT UPPER(FamilyName) + ', ' + FirstName, .... Mostly it works fine, but I just noticed that if there are any Cyrillic names in the list they are not converted, so for this format are presented like "Толстяков, Антон" instead of "ТОЛСТЯКОВ, Антон". I suppose there will be other languages where either there is no comparable convention (no problem), or there may be a similar issue. If I am missing a trick I would like to know, but if that is just the way it works I am not too bothered. No user has 'complained'. Malcolm |
Wed, Aug 1 2018 5:41 AM | Permanent Link |
Malcolm Taylor | Tim Young [Elevate Software] wrote:
...snip.. > So, you could force a specific collation by using something like this: > > UPPER(MyColumn) > > becomes > > UPPER(MyColumn COLLATE RUS) > ...snip.. sorry, been doing other things My data columns in question are defined as collation UNI_CI. OK, so I tried including COLLATE RUS .. and it works, as you suggested. So I wondered what if the query returns text in a mix of character sets? Using COLLATE CHS also works for cyrillic. Hmm. Using COLLATE ENU ditto ! Using COLLATE UNI does not return uppercase cyrillic but does work for many other character sets. So at least for now, is it enough for me to include COLLATE ENU (or almost any other collation except UNI) in order to get the returned text in upper case? Is there some reason why COLLATE UNI does not work .. or is that just because the collation is already known from the Column definition and the problem is that the UPPER() function does not yet support UTF-16 completely? This is not quite what I expected to discover so I must be missing something (brains?). I thought that collations were important for ordering/matching and would not affect letter case. Malcolm |
Wed, Aug 1 2018 7:08 AM | Permanent Link |
Wolfgang Klein | I had a similar discusssion with Tim almost 3 years ago. Unfortunately I was not able to convince him that the current implementation of UNI_CI is wrong.
In the end I had to modify ElevateDB myself. It's just two lines in edbstring.pas: UnicodeLowerCharMap[I]:=UnicodeChar(I); -> UnicodeLowerCharMap[I]:=UnicodeChar(TEDBIntPtr(CharLower(pUnicodeChar(I)))); and UnicodeUpperCharMap[I]:=UnicodeChar(I); -> UnicodeUpperCharMap[I]:=UnicodeChar(TEDBIntPtr(CharUpper(pUnicodeChar(I)))); Wolfgang |
Wed, Aug 1 2018 12:19 PM | Permanent Link |
Malcolm Taylor | Wolfgang Klein wrote:
> I had a similar discusssion with Tim almost 3 years ago. > Unfortunately I was not able to convince him that the current > implementation of UNI_CI is wrong. > > In the end I had to modify ElevateDB myself. It's just two lines in > edbstring.pas: > > UnicodeLowerCharMap[I]:=UnicodeChar(I); > -> > UnicodeLowerCharMap[I]:=UnicodeChar(TEDBIntPtr(CharLower(pUnicodeChar( > I)))); > > and > > UnicodeUpperCharMap[I]:=UnicodeChar(I); > -> > UnicodeUpperCharMap[I]:=UnicodeChar(TEDBIntPtr(CharUpper(pUnicodeChar( > I)))); > > > Wolfgang Hi Wolfgang Thanks, that is interesting. I don't have the source so I can't try that, but what puzzles me is why the SQL UPPER/LOWER functions return UPPER or LOWER for most of languages I encounter but not other ones. I know that some languages do not have directly equivalent upper and lower case characters or even no concept of letter case, but for those that do the functions should 'work'. I am with you in expecting the upper/lower equivalents in the Unicode Specification to be returned and I wonder if Tim may be using one of the earlier Delphi functions or maps, now deprecated, to serve up the values. But that's not like Tim. Malcolm |
Mon, Aug 6 2018 1:56 PM | Permanent Link |
Tim Young [Elevate Software] Elevate Software, Inc. timyoung@elevatesoft.com | Wolfgang,
<< I had a similar discusssion with Tim almost 3 years ago. Unfortunately I was not able to convince him that the current implementation of UNI_CI is wrong. >> It *doesn't matter* if it's wrong for certain linguistic conversions. What matters is that changing it will break *every single index* that uses it, and that's no good. The existing UNI collation is a simple basic character mapping that covers primarily English and Western European mappings. It's the equivalent to similar simple mappings in SQL Server, etc. for straight binary code point comparisons, with a slight twist. If you need a *linguistically-correct* upper-casing result, then you need to specify a collation that will provide it. << In the end I had to modify ElevateDB myself. It's just two lines in edbstring.pas: >> For anyone reading, I would *not* recommend doing that. If you moved the DB to another OS, won't work anymore, and it is dependent upon the current system locale for ANSI versions of the function (non-Unicode Delphi versions). Remember, EDB still has to work across multiple OS's, as well as with ANSI and Unicode strings, without changing behaviors. Tim Young Elevate Software www.elevatesoft.com |
Mon, Aug 6 2018 2:00 PM | Permanent Link |
Tim Young [Elevate Software] Elevate Software, Inc. timyoung@elevatesoft.com | Malcolm,
<< So I wondered what if the query returns text in a mix of character sets? >> In that case, your results will be mixed, depending upon which collation you're using. Some will cope, and others will not. << So at least for now, is it enough for me to include COLLATE ENU (or almost any other collation except UNI) in order to get the returned text in upper case? >> Yes. << Is there some reason why COLLATE UNI does not work .. >> Yes, it's not designed to work with anything other than English and other western European languages. It's a simple mapping that is similar to a basic binary code point comparison. Tim Young Elevate Software www.elevatesoft.com |
Mon, Aug 6 2018 6:50 PM | Permanent Link |
Malcolm Taylor | Tim Young [Elevate Software] wrote:
> > << Is there some reason why COLLATE UNI does not work .. >> > > Yes, it's not designed to work with anything other than English and > other western European languages. It's a simple mapping that is > similar to a basic binary code point comparison. > Thanks for the clarification, Tim. Ah, yes, the good old ASCII 'case' bit, updated to ANSI-ish. Well, it is what it is. COLLATE ENG seems to work for me - certainly better than before. Malcolm |
Wed, Aug 8 2018 1:51 PM | Permanent Link |
Tim Young [Elevate Software] Elevate Software, Inc. timyoung@elevatesoft.com | Malcolm,
<< Thanks for the clarification, Tim. Ah, yes, the good old ASCII 'case' bit, updated to ANSI-ish. >> Yeah, it's very basic, but it's portable and works the same for both ANSI and Unicode code points. Tim Young Elevate Software www.elevatesoft.com |
This web page was last updated on Tuesday, April 30, 2024 at 03:55 PM | Privacy PolicySite Map © 2024 Elevate Software, Inc. All Rights Reserved Questions or comments ? E-mail us at info@elevatesoft.com |