JS函數charCodeAt的Lua實現

JS函數charCodeAt的Lua實現

charCodeAt by Lua

@(Lua JavaScript charCodeAt)

I wanted to have a function charCodeAt in Lua ,and it should works exactly like javascript
but with Lua5.1 ,UTF8 and Unicode are not supported,

1: how charCodeAt works in javascript

to show Console press F12 in Chrome( MAC:CMD+alt+J)

[
'你'.charCodeAt(0),
'?'.charCodeAt(0),
'n'.charCodeAt(0)
]

it will output [20320, 241, 110] ,it means the numeric value of Unicode , '你'=20320 , '?'=241, 'n'=110.

The charCodeAt() method returns the numeric Unicode value of the character at the given index (except for unicode codepoints > 0x10000).

according to alexander-yakushev we can know how many bytes one UTF8 word takes using function utf8.charbytes
[https://github.com/alexander-yakushev/awesompd/blob/master/utf8.lua]

function utf8.charbytes (s, i)
   -- argument defaults
   i = i or 1
   local c = string.byte(s, i) 
   -- determine bytes needed for character, based on RFC 3629
   if c > 0 and c <= 127 then
      -- UTF8-1 byte
      return 1
   elseif c >= 194 and c <= 223 then
      -- UTF8-2 byte
      return 2
   elseif c >= 224 and c <= 239 then
      -- UTF8-3 byte
      return 3
   elseif c >= 240 and c <= 244 then
      -- UTF8-4 byte
      return 4
   end
end

Unicode & UTF8 convert method

Unicode code range UTF-8 code example
hex code binary code char
0000 0000-0000 007F 0xxxxxxx n(alphabet)
0000 0000-0000 007F 110xxxxx 10xxxxxx ?
0000 0080-0000 07FF 1110xxxx 10xxxxxx 10xxxxxx (most CJK)
0001 0000-0010 FFFF 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx other chars

but we should pay attention to 4 bytes UTF8[emoji], it works not that simple

special Method

javascript engine using UTF16,characters in Basic Multilingual Plane were the same with unicode, but if the characters were in Supplementary Plane it should use the formula below,usually we encounter Supplementary Plane emoji like?? (4 byte UTF8 character)

-- formula 1
H = Math.floor((c-0x10000) / 0x400)+0xD800 
L = (c - 0x10000) % 0x400 + 0xDC00

code is here

https://github.com/lilien1010/lua-bit

Feedback & Bug Report


Thank you for reading this , if you got any better idea, share it.

最后編輯于
?著作權歸作者所有,轉載或內容合作請聯系作者
平臺聲明:文章內容(如有圖片或視頻亦包括在內)由作者上傳并發布,文章內容僅代表作者本人觀點,簡書系信息發布平臺,僅提供信息存儲服務。

推薦閱讀更多精彩內容

  • 思不見, 幽深碧海接天流,疏離月下掛龍鉤。 危林猛生天地增,金陽驟起萬獸游。 龐魚騰浪吞鯤鵬,惡虎攔空撕飛虬。 天...
    弄情閱讀 649評論 66 97
  • 昨日做下的事: 去婁底見一個朋友,聊了很久。 看專欄文章,寫讀后感。 寫日記總結,練雙截棍。 做做熟人市場的寬帶預...
    文建偉CZYH閱讀 394評論 0 0
  • 1. /proc/kallsyms列出了linux內核導出的所有符號及對應的地址。 基本格式是: 邏輯地址 標識 ...
    WebSSO閱讀 426評論 0 0