| Author |
Message |
< Erlang ~ length of utf-8 string |
| vinnitu |
Posted: Fri Jun 05, 2009 2:48 pm |
|
|
|
User
Joined: 31 May 2007
Posts: 14
|
Hi.
What is the right way to determinate length of utf-8 string?
if I use length() - it say me 4 on 2 russian letter, but 2 on 2 english letter...
how to solve this problem to show the same length?
Thanks |
|
|
| Back to top |
|
| seanmc |
Posted: Mon Jun 08, 2009 8:22 am |
|
|
|
User
Joined: 03 Aug 2007
Posts: 10
|
Hi Vinnitu,
I can't find any examples here to verify this but try:
string:len("example russian characters").
//Sean. |
|
|
| Back to top |
|
| alexarnon |
Posted: Wed Jun 24, 2009 8:14 am |
|
|
|
User
Joined: 26 Jan 2008
Posts: 14
|
Try: length(xmerl_ucs:from_utf8("example russian here")).
Using string:len(...) will simply return the original string/list's length. |
|
|
| Back to top |
|
| rvirding |
Posted: Thu Jun 25, 2009 11:47 pm |
|
|
|
User
Joined: 30 Aug 2006
Posts: 452
Location: Stockholm, Sweden
|
What do you mean when you say the "length of a utf-8 string"? Do you mean the number of code points, or how many bytes it takes in the current encoding? Or something else.
There is no built-in way to do this safely, it very much depends how you store your string. If it is a list then it is probably the length of the list you want as it is recommended you use code points in list. If it is a binary then the size of the binary will give the number of bytes in the used encoding. |
|
|
| Back to top |
|
| rvirding |
Posted: Thu Jun 25, 2009 11:49 pm |
|
|
|
User
Joined: 30 Aug 2006
Posts: 452
Location: Stockholm, Sweden
|
What do you mean when you say the "length of a utf-8 string"? Do you mean the number of code points, or how many bytes it takes in the current encoding? Or something else.
There is no built-in way to do this safely, it very much depends how you store your string. If it is a list then it is probably the length of the list you want as it is recommended you use code points in list. If it is a binary then the size of the binary will give the number of bytes in the used encoding. |
|
|
| Back to top |
|
| Allan |
Posted: Mon Jun 29, 2009 4:37 pm |
|
|
|
User
Joined: 29 Jun 2009
Posts: 30
|
vinnitu wrote: What is the right way to determinate length of utf-8 string?
Since 5.6/OTP R12B Erlang has some unicode support. A standard string in Erlang is either a list of unicode code points or a binary containing utf-8 encoded code points.
So i guess, that you've got a utf-8 binary and want to know its length in code points / characters.
The easiest way to get this is to use length(unicode:characters_to_list(Utf8_binary)). |
|
|
| Back to top |
|
| baryluk |
Posted: Tue Aug 18, 2009 10:06 am |
|
|
|
User
Joined: 05 Aug 2009
Posts: 48
|
| This depends what you mean length. Unicode allows for lots of character modifier, which can be befor or after code point. So 20 bytes of UTF-8, can be single character. If you want to know what is the width on the screen, there are some functions for this i think. |
|
|
| Back to top |
|
|
|