Mysql字符集

字符集,字符序的基本概念及其联系

在数据的存储上,MySQL提供了不同的字符集支持。而在数据的对比操作上,则提供了不同的字符序支持。

MySQL提供了不同级别的设置,包括server级、database级、table级、column级,可以提供非常精准的设置。

什么是字符集、字符序?简单的来说:

  1. 字符集(character set):定义了字符以及字符的编码。
  2. 字符序(collation):定义了字符的比较规则。

Example:

  1. 有四个字符:ABab,这四个字符的编码分别是A = 0, B = 1, a = 2, b = 3。这里的字符 + 编码就构成了字符集(character set)。
  2. 如果我们想比较两个字符的大小呢?比如AB,或者ab,最直观的比较方式是采用它们的编码,比如因为0 < 1,所以 A < B
  3. 另外,对于Aa,虽然它们编码不同,但我们觉得大小写字符应该是相等的,也就是说 A == a
  4. 这上面定义了两条比较规则,这些比较规则的集合就是collation
  5. 同样是大写字符、小写字符,则比较他们的编码大小;
  6. 如果两个字符为大小写关系,则它们相等。

MySQL支持的字符集、字符序

MySQL支持多种字符集 与 字符序。

  • 一个字符集对应至少一种字符序(一般是1对多)。

  • 两个不同的字符集不能有相同的字符序。

  • 每个字符集都有默认的字符序。

查看支持的字符集

可以通过以下方式查看MYSQL支持的字符集。

  1. show character set ;
  1. armscii8 ARMSCII-8 Armenian armscii8_general_ci 1
  2. ascii US ASCII ascii_general_ci 1
  3. big5 Big5 Traditional Chinese big5_chinese_ci 2
  4. binary Binary pseudo charset binary 1
  5. cp1250 Windows Central European cp1250_general_ci 1
  6. cp1251 Windows Cyrillic cp1251_general_ci 1
  7. cp1256 Windows Arabic cp1256_general_ci 1
  8. cp1257 Windows Baltic cp1257_general_ci 1
  9. cp850 DOS West European cp850_general_ci 1
  10. cp852 DOS Central European cp852_general_ci 1
  11. cp866 DOS Russian cp866_general_ci 1
  12. cp932 SJIS for Windows Japanese cp932_japanese_ci 2
  13. dec8 DEC West European dec8_swedish_ci 1
  14. eucjpms UJIS for Windows Japanese eucjpms_japanese_ci 3
  15. euckr EUC-KR Korean euckr_korean_ci 2
  16. gb18030 China National Standard GB18030 gb18030_chinese_ci 4
  17. gb2312 GB2312 Simplified Chinese gb2312_chinese_ci 2
  18. gbk GBK Simplified Chinese gbk_chinese_ci 2
  19. geostd8 GEOSTD8 Georgian geostd8_general_ci 1
  20. greek ISO 8859-7 Greek greek_general_ci 1
  21. hebrew ISO 8859-8 Hebrew hebrew_general_ci 1
  22. hp8 HP West European hp8_english_ci 1
  23. keybcs2 DOS Kamenicky Czech-Slovak keybcs2_general_ci 1
  24. koi8r KOI8-R Relcom Russian koi8r_general_ci 1
  25. koi8u KOI8-U Ukrainian koi8u_general_ci 1
  26. latin1 cp1252 West European latin1_swedish_ci 1
  27. latin2 ISO 8859-2 Central European latin2_general_ci 1
  28. latin5 ISO 8859-9 Turkish latin5_turkish_ci 1
  29. latin7 ISO 8859-13 Baltic latin7_general_ci 1
  30. macce Mac Central European macce_general_ci 1
  31. macroman Mac West European macroman_general_ci 1
  32. sjis Shift-JIS Japanese sjis_japanese_ci 2
  33. swe7 7bit Swedish swe7_swedish_ci 1
  34. tis620 TIS620 Thai tis620_thai_ci 1
  35. ucs2 UCS-2 Unicode ucs2_general_ci 2
  36. ujis EUC-JP Japanese ujis_japanese_ci 3
  37. utf16 UTF-16 Unicode utf16_general_ci 4
  38. utf16le UTF-16LE Unicode utf16le_general_ci 4
  39. utf32 UTF-32 Unicode utf32_general_ci 4
  40. utf8 UTF-8 Unicode utf8_general_ci 3
  41. utf8mb4 UTF-8 Unicode utf8mb4_0900_ai_ci 4

也可以通过以下方式查看MYSQL支持的字符集。

  1. use information_schema;
  2. select * from CHARACTER_SETS;
  1. big5 big5_chinese_ci Big5 Traditional Chinese 2
  2. dec8 dec8_swedish_ci DEC West European 1
  3. cp850 cp850_general_ci DOS West European 1
  4. hp8 hp8_english_ci HP West European 1
  5. koi8r koi8r_general_ci KOI8-R Relcom Russian 1
  6. latin1 latin1_swedish_ci cp1252 West European 1
  7. latin2 latin2_general_ci ISO 8859-2 Central European 1
  8. swe7 swe7_swedish_ci 7bit Swedish 1
  9. ascii ascii_general_ci US ASCII 1
  10. ujis ujis_japanese_ci EUC-JP Japanese 3
  11. sjis sjis_japanese_ci Shift-JIS Japanese 2
  12. hebrew hebrew_general_ci ISO 8859-8 Hebrew 1
  13. tis620 tis620_thai_ci TIS620 Thai 1
  14. euckr euckr_korean_ci EUC-KR Korean 2
  15. koi8u koi8u_general_ci KOI8-U Ukrainian 1
  16. gb2312 gb2312_chinese_ci GB2312 Simplified Chinese 2
  17. greek greek_general_ci ISO 8859-7 Greek 1
  18. cp1250 cp1250_general_ci Windows Central European 1
  19. gbk gbk_chinese_ci GBK Simplified Chinese 2
  20. latin5 latin5_turkish_ci ISO 8859-9 Turkish 1
  21. armscii8 armscii8_general_ci ARMSCII-8 Armenian 1
  22. utf8 utf8_general_ci UTF-8 Unicode 3
  23. ucs2 ucs2_general_ci UCS-2 Unicode 2
  24. cp866 cp866_general_ci DOS Russian 1
  25. keybcs2 keybcs2_general_ci DOS Kamenicky Czech-Slovak 1
  26. macce macce_general_ci Mac Central European 1
  27. macroman macroman_general_ci Mac West European 1
  28. cp852 cp852_general_ci DOS Central European 1
  29. latin7 latin7_general_ci ISO 8859-13 Baltic 1
  30. cp1251 cp1251_general_ci Windows Cyrillic 1
  31. utf16 utf16_general_ci UTF-16 Unicode 4
  32. utf16le utf16le_general_ci UTF-16LE Unicode 4
  33. cp1256 cp1256_general_ci Windows Arabic 1
  34. cp1257 cp1257_general_ci Windows Baltic 1
  35. utf32 utf32_general_ci UTF-32 Unicode 4
  36. binary binary Binary pseudo charset 1
  37. geostd8 geostd8_general_ci GEOSTD8 Georgian 1
  38. cp932 cp932_japanese_ci SJIS for Windows Japanese 2
  39. eucjpms eucjpms_japanese_ci UJIS for Windows Japanese 3
  40. gb18030 gb18030_chinese_ci China National Standard GB18030 4
  41. utf8mb4 utf8mb4_0900_ai_ci UTF-8 Unicode 4

注意

当使用SHOW CHARACTER SET查看时,也可以加上WHERELIKE限定条件。

例子一:使用WHERE限定条件。

  1. mysql> SHOW CHARACTER SET WHERE Charset="utf8";
  2. +---------+---------------+-------------------+--------+
  3. | Charset | Description | Default collation | Maxlen |
  4. +---------+---------------+-------------------+--------+
  5. | utf8 | UTF-8 Unicode | utf8_general_ci | 3 |
  6. +---------+---------------+-------------------+--------+
  7. 1 row in set (0.00 sec)

例子二:使用LIKE限定条件。

  1. mysql> SHOW CHARACTER SET LIKE "utf8%";
  2. +---------+---------------+--------------------+--------+
  3. | Charset | Description | Default collation | Maxlen |
  4. +---------+---------------+--------------------+--------+
  5. | utf8 | UTF-8 Unicode | utf8_general_ci | 3 |
  6. | utf8mb4 | UTF-8 Unicode | utf8mb4_general_ci | 4 |
  7. +---------+---------------+--------------------+--------+
  8. 2 rows in set (0.00 sec)

查看支持的字符序

类似的,可以通过如下方式查看MYSQL支持的字符序。

方式一:通过SHOW COLLATION进行查看。

可以看到,utf8字符集有超过10种字符序。通过Default的值是否为Yes,判断是否默认的字符序。

  1. mysql> SHOW COLLATION WHERE Charset = 'utf8';
  2. +--------------------------+---------+-----+---------+----------+---------+
  3. | Collation | Charset | Id | Default | Compiled | Sortlen |
  4. +--------------------------+---------+-----+---------+----------+---------+
  5. | utf8_general_ci | utf8 | 33 | Yes | Yes | 1 |
  6. | utf8_bin | utf8 | 83 | | Yes | 1 |
  7. ...略

方式二:查询information_schema.COLLATIONS

  1. mysql> USE information_schema;
  2. mysql> SELECT * FROM COLLATIONS WHERE CHARACTER_SET_NAME="utf8";
  3. +--------------------------+--------------------+-----+------------+-------------+---------+
  4. | COLLATION_NAME | CHARACTER_SET_NAME | ID | IS_DEFAULT | IS_COMPILED | SORTLEN |
  5. +--------------------------+--------------------+-----+------------+-------------+---------+
  6. | utf8_general_ci | utf8 | 33 | Yes | Yes | 1 |
  7. | utf8_bin | utf8 | 83 | | Yes | 1 |
  8. | utf8_unicode_ci | utf8 | 192 | | Yes | 8 |

字符序的命名规范

字符序的命名,以其对应的字符集作为前缀,如下所示。比如字符序utf8_general_ci,标明它是字符集utf8的字符序。

更多规则可以参考 官方文档

  1. MariaDB [information_schema]> SELECT CHARACTER_SET_NAME, COLLATION_NAME FROM COLLATIONS WHERE CHARACTER_SET_NAME="utf8" limit 2;
  2. +--------------------+-----------------+
  3. | CHARACTER_SET_NAME | COLLATION_NAME |
  4. +--------------------+-----------------+
  5. | utf8 | utf8_general_ci |
  6. | utf8 | utf8_bin |
  7. +--------------------+-----------------+
  8. 2 rows in set (0.00 sec)