(未完待续)
There are a few noteworthy high-level features about Julia’s strings:
- The built-in concrete type used for strings (and string literals) in Julia is String. This supports the full range of Unicode characters via the UTF-8 encoding. (A transcode() function is provided to convert to/from other Unicode encodings.)
- Julia完美支持UTF-8编码的字符,同时也支持其他类型的字符编码
- All string types are subtypes of the abstract type
AbstractString
, and external packages define additionalAbstractString
subtypes (e.g. for other encodings). If you define a function expecting a string argument, you should declare the type asAbstractString
in order to accept any string type. - 所有字符串类型都是抽象类型AbstractString的子类型,同时外部包也可以定义其他AbstractString子类型(例如,用于其他编码)。如果定义了一个期望字符串参数的函数,则应该将该类型声明为AbstractString,以便接受任何字符串类型。
- Like C and Java, but unlike most dynamic languages, Julia has a first-class type representing a single character, called
Char
. This is just a special kind of 32-bit primitive type whose numeric value represents a Unicode code point. 与C和Java一样,但与大多数动态语言不同,Julia有一个表示单个字符的第一类类型,称为Char。 这只是一种特殊的32位基元类型,其数值代表Unicode的代码点。
构建char类型需要适应单引号,占用32bit
- As in Java and Python, strings are immutable: the value of an
AbstractString
object cannot be changed. To construct a different string value, you construct a new string from parts of other strings. - Julia中的字符串是不可变类型
- Conceptually, a string is a partial function from indices to characters: for some index values, no character value is returned, and instead an exception is thrown. This allows for efficient indexing into strings by the byte index of an encoded representation rather than by a character index, which cannot be implemented both efficiently and simply for variable-width encodings of Unicode strings.
- 概念上,字符串是从索引值映射到字符的部分函数,对某些索引值,如果不是字符,会抛出异常
4.1 Char类型
Char类型表示单个字符,占用32-bit,用单引号生成。char的数值通过代码点(Unicode Code Point)翻译成字符
julia> 'x'
'x': ASCII/Unicode U+0078 (category Ll: Letter, lowercase)
julia> typeof(ans)
Char
可以很容易将char型转换成整数(ie. Code Point)
julia> Int('x')
120
julia> typeof(ans)
Int64
在32位架构下,typeof(ans) 返回 Int32. 同样的我们也可以将整数恢复成字符
julia> Char(120)
'x': ASCII/Unicode U+0078 (category Ll: Letter, lowercase)
不是所有的整数都是有效的代码点,出于性能的考虑,Char()
不会检查字符是否有效,如果你想检查整数是否是有效的代码点,需要使用isvalid()
函数
julia> Char(0x110000)
'\U110000': Unicode U+110000 (category Cn: Other, not assigned)
julia> isvalid(Char, 0x110000)
false
目前,有效的 Unicode 码位为,从 U+00
至 U+d7ff
,以及从 U+e000
至 U+10ffff
你可以通过\u
接4位十六进制数或者\U
接8位十六进制数来生成单个字符的引用(single quote)
事实上,有效值最多只需要6位
julia> '\u0'
'\0': ASCII/Unicode U+0000 (category Cc: Other, control)
julia> '\u78'
'x': ASCII/Unicode U+0078 (category Ll: Letter, lowercase)
julia> '\u2200'
'∀': Unicode U+2200 (category Sm: Symbol, math)
julia> '\U10ffff'
'\U10ffff': Unicode U+10ffff (category Cn: Other, not assigned)
Julia 使用系统默认的区域和语言设置来确定,哪些字符可以被正确显示,哪些需要用 \u 或 \U 的转义来显示。除 Unicode 转义格式之外,所有 C 语言转义的输入格式都能使用
julia> Int('\0')
0
julia> Int('\t')
9
julia> Int('\n')
10
julia> Int('\e')
27
julia> Int('\x7f')
127
julia> Int('\177')
127
julia> Int('\xff')
255
最后,char类型可以做比较,并进行少量的代数运算
julia> 'A' < 'a'
true
julia> 'A' <= 'a' <= 'Z'
false
julia> 'A' <= 'X' <= 'Z'
true
julia> 'x' - 'a'
23
julia> 'A' + 1
'B': ASCII/Unicode U+0042 (category Lu: Letter, uppercase)