前言
CSDN地址:http://blog.csdn.net/game3108/article/details/52957669
最近在看蘋果官方swift文檔《The Swift Programming Language》,記錄一些筆記。
Extended Grapheme Clusters
swift使用了Extended Grapheme Clusters作為Character的顯示。內容如下:
Every instance of Swift’s Character
type represents a single extended grapheme cluster. An extended grapheme cluster is a sequence of one or more Unicode scalars that (when combined) produce a single human-readable character.
具體的定義可以參考unicode的標準文檔Grapheme Cluster Boundaries:
Unicode標準提供了算法去定義grapheme cluster boundaries,其中包括兩種變種:legacy grapheme clusters and extended grapheme clusters。
A legacy grapheme cluster is defined as a base (such as A or カ) followed by zero or more continuing characters. One way to think of this is as a sequence of characters that form a “stack”.
An extended grapheme cluster is the same as a legacy grapheme cluster, with the addition of some other characters. The continuing characters are extended to include all spacing combining marks, such as the spacing (but dependent) vowel signs in Indic scripts.
具體的計算方式可以從文章中進行詳細的了解。
舉例
拿一個Apple文檔里的例子:
let precomposed: Character = "\u{D55C}" // ?
let decomposed: Character = "\u{1112}\u{1161}\u{11AB}" // ?, ?, ?
// precomposed is ?, decomposed is ???
韓文的音節可以拆分和組合,上面的兩個String就是相同的String。
因為這種編碼方式的問題,Swift想取一個String的字符個數,需要使用"".characters.count
的方式,獲取character,再獲取chara的個數。
Swift這邊的String用的是21bit Unicode scalar字符編碼方式(相當于UTF-32),而OC中的NSString用的是UTF-16字符編碼方式。
所以對同一個String,轉化為NSString,可能獲得的長度方式也不同:
var str = "Hello ??" // the square is an emoji
str.characters.count // returns 7
(str as NSString).length // returns 8
就是現在所見非所得了,所以在處理swift string與nsstring轉化時,要注意一下unicode的編碼和長度問題。
參考資料
1.The Swift Programming Language
2.Why is Swift counting this Grapheme Cluster as two characters instead of one?
3.Grapheme Cluster Boundaries