美文网首页
字符集,排序规则,Unicode

字符集,排序规则,Unicode

作者: szn好色仙人 | 来源:发表于2019-12-22 21:13 被阅读0次

序言

show character set;
--查看mysql支持的字符集,部分输出如下
Character Describption Default collation Maxlen
ascii US ASCII ascii_general_ci 1
big5 Big5 Traditional Chinese(繁体中文) big5_chinese_ci 2
binary Binary pseudo charset binary 1
gb18030 China National Standard GB18030 gb18030_chinese_ci 4
gb2312 GB2312 Simplified Chinese gb2312_chinese_ci 2
gbk GBK Simplified Chinese gbk_chinese_ci 2
utf16 UTF-16 Unicode utf16_general_ci 4
utf16le UTF-16LE Unicode utf16le_general_ci 4
utf32 UTF-32 Unicode utf32_general_ci 4
utf8 UTF-8 Unicode utf8_general_ci 3
utf8mb4 UTF-8 Unicode utf8mb4_0900_ai_ci 4
show collation;
--查看mysql支持的collation

指定字符集与排序规则

服务器的字符集与排序规则

  • character_set_server的默认值为utf8mb4

    show variables like 'character_set_server';
    --输出的Value列为utf8mb4
    
  • my.ini中可以设置character_set_server的值:

    character-set-server=utf8;
    --在my.ini中添加上述行,重启服务,即可设置character_set_server的值
    
  • 在运行过程中,可以更改character_set_server的值:

    set character_set_server = gbk;
    
  • 如果未在create database语句中指定字符集和排序规则,则会将character_set_servercollation_server作为默认值

  • 若只指定了字符集没有指定排序规则,则使用字符集对应的默认的排序规则

数据库的字符集与排序规则

  • 可以在create databasealter database时指定字符集和排序规则

    create database d character set gbk collate gbk_bin;
    alter database d character set gb2312 collate gb2312_bin;
    
    use d;
    SELECT @@character_set_database, @@collation_database;
    --此sql语句用于输出当前数据库的字符集与排序规则
    
  • 当前数据库的字符集和排序规则,可以通过查看character_set_databasecollation_database来确定。若当前没有默认数据库,则character_set_databasecollation_databasecharacter_set_servercollation_server相同

    create database d0 character set gbk collate gbk_bin;
    create database d1 character set utf8 collate utf8_bin;
    
    use d0;
    show variables like 'character_set_database';
    --输出的character_set_database值为gbk
    
    use d1;
    show variables like 'character_set_database';
    --输出的character_set_database值为utf8
    
  • 数据库的字符集和排序规则会产生以下影响:

    • create table未指定字符集和排序规则时,则采用数据库的字符集和排序规则

    • load data未指定字符集和排序规则,则使用数据库的字符集来解释文件中的信息

    • 创建的存储过程和函数未指定字符集和排序规则,则使用数据库的字符集和排序规则

      use d0; --create database d0 character set gbk collate gbk_bin;
      delimiter \
      
      create procedure p0(in str varchar(111))
      begin
              select charset(str);
      end\
      
      delimiter ;
      
      call p0("szn");
      --输出:gbk
      
      use d1; --database d1 character set utf8 collate utf8_bin;
      delimiter \
      
      create procedure p1(in str varchar(111))
      begin
              select charset(str);
      end\
      
      delimiter ;
      
      call p1("szn");
      --输出:utf8
      
      use d1; --database d1 character set utf8 collate utf8_bin;
      delimiter \
      
      create procedure p_set(in str varchar(111) character set gb2312)
      begin
              select charset(str);
      end\
      
      delimiter ;
      
      call p_set("szn");
      --输出:gb2312
      

表的字符集与排序规则

  • 可以在create tablealter table时指定表的字符集和排序规则

    create table t0(name varchar(1024)) character set gbk collate gbk_bin;
    alter table t0 character set utf8;
    
  • 若未指定表的字符集和排序规则,则使用对应数据库的字符集和排序规则

  • 若表中的列未指定字符集和排序规则,则对应的列使用表的字符集和排序规则

列的字符集与排序规则

  • create tablealter table时指定列的字符集和排序规则

    create table t2(name varchar(1024) character set gbk collate gbk_bin);
    alter table t2 modify column name varchar(1024) character set utf8 collate utf8_bin;
    
  • 使用alter table将列的字符集进行更改时,若字符集不兼容,则可能会丢失数据

字符串字面量的字符集与排序规则

  • 字符串字面量的默认字符集和排序规则由character_set_connectioncollation_connection指定

  • 字符串字面量可以设定字符集与排序规则:

    show variables like 'character_set_connection';
    --输出值:utf8
    
    set @b = "szn";
    select charset(@b);
    --输出值:utf8
    
    set @c = _gbk"哈" COLLATE gbk_bin;
    --字符串字面量前指定编码方式,这种方式称为介绍器(Character Set Introducers)
    
    select charset(@c);
    --输出值:gbk
    
  • 字符串字面量指定字符集并不会更改其值,解析器最终仍按照character_set_connection进行处理

    set @c0 = _gbk"哈" COLLATE gbk_bin;
    set @c1 = _utf8"哈" COLLATE utf8_bin;
    select hex(@c0), hex(@c1);
    --两个输出均为"哈"的utf8编码 E59388
    

Character Set Introducers

  • 字符串字面量,十六进制字面量,位字面量都有一个可选的字符集和排序规则设置,这称为介绍器

  • 介绍器告诉解析器后面字符串使用的字符集,但是不会改变字符串的值

    set @s0 = _utf8 0xE59388;
    set @s1 = _gbk 0xB9FE;
    set @s2 = _gbk X'B9FE';
    set @s3 = _utf8 b'111001011001001110001000';
    set @s4 = _utf8 0b111001011001001110001000;
    select @s0, @s1, @s2, @s3, @s4;
    --输出:哈,哈,哈,哈, 哈
    
    SET @v1 = X'000D' | X'0BC0';
    --@V1是bigint类型
    
    SET @v2 = _binary X'000D' | X'0BC0';
    --@v2是字符串
    
    SELECT HEX(@v1), HEX(@v2);
    --输出:BCD, 0BCD
    
    select @v1 + 1, @v2 + 1, "0BCD" + "1";.
    --输出:3022, 1, 1
    

连接的字符集与排序规则

  • 每个客户端的连接都有特定且可更改的字符集和排序规则

  • 客户端发起sql查询,至结果返回到客户端:

    • character_set_client表明客户端发送过来的sql语句的字符集
    • 服务器将接收到的sql语句的字符集转换为character_set_connection
    • 服务器将sql的执行结果的字符集转换为character_set_results
  • character_set_client不允许一些字符集设置:

    ucs2
    utf16
    utf16le
    utf32
    
  • set character set会同时将character_set_clientcharacter_set_results设定为给定值,并且将character_set_connection设置为character_set_database的值

    set character_set_client = ascii;
    set character_set_results = big5;
    set character_set_database = gb2312;
    set character_set_connection = gbk;
    
    set character set latin2;
    show variables like 'character%';
    
    Variable_name Value
    character_set_client latin2
    character_set_connection gb2312
    character_set_database gb2312
    character_set_results latin2
  • my.ini中添加default-character-set=latin2,重启服务,将会同时更改character_set_client,character_set_connection,character_set_resultslatin2

  • 可以在登录mysql时:mysql -u root -p --default-character-set=latin1

  • set names gb2312;
    --等效于下面三句代码
    
    set character_set_client = gb2312;
    set character_set_results = gb2312;
    set character_set_connection = gb2312;
    

Unicode支持

​ The Unicode Standard includes characters from the Basic Multilingual(使用多种语言的) Plane (BMP) and supplementary(追加的) characters that lie outside the BMP.

​ BMP characters have these characteristics:

  • Their code point values are between 0 and 65535 (or U+0000 and U+FFFF).
  • They can be encoded in a variable-length encoding using 8, 16, or 24 bits (1 to 3 bytes).
  • They can be encoded in a fixed-length encoding using 16 bits (2 bytes).
  • They are sufficient(充分的) for almost all characters in major(主要的) languages.

​ Supplementary characters lie outside the BMP:

  • Their code point values are between U+10000 and U+10FFFF).
  • 占用的空间比BMP内的字符大,最多占用4字节

mysql支持的unicode

字符集 单个字符占用空间 支持的字符 描述
utf8mb4 1到4字节 BMP and supplementary A UTF-8 encoding of the Unicode character set
utf8mb3 1到3字节 BMP only A UTF-8 encoding of the Unicode character set
utf8 1到3字节 BMP only An alias for utf8mb3.
ucs2 2字节 BMP and supplementary The UCS-2 encoding of the Unicode character set
utf16 2或4字节 BMP and supplementary The UTF-16 encoding for the Unicode character set . Like ucs2 but with an extension for supplementary characters.
utf16le 2或4字节 BMP and supplementary The UTF-16LE encoding for the Unicode character set. Like utf16 but little-endian rather than big-endian.
utf32 4字节 BMP and supplementary The UTF-32 encoding for the Unicode character set
  • utf8当前是utf8mb3的别称,但是utf8mb3将在mysql将来的版本被删除。有时候utf8会变成utf8mb4的别称,所以为了避免模糊不清,请考虑使用utf8mb4

  • 对于BMP字符,utf8mb4utf8mb3具有相同编码值,占用的空间也相同。即utf8mb4utf8mb3的超集

  • 对于补充字符,utf8mb4需要四个字节来存储

相关文章

网友评论

      本文标题:字符集,排序规则,Unicode

      本文链接:https://www.haomeiwen.com/subject/upcanctx.html