mysql多列索引

作者: 小东班吉 | 来源:发表于2019-08-01 14:34 被阅读0次

索引
MySql 数据查询优化
MySQL 多列索引
MySQL多列索引
mysql多列索引
mysql 查询效率优化之常用索引的几种类型新手使用教程，少
Mysql索引失效
MySql 索引使用总结-基础篇
49-MySQL-索引的创建与删除
MySql 联合索引分析

mysql多列索引

复合索引

这几天面试了一家公司，面试官问了个关于复合索引的，本以为对索引数据结构很了解了，谁知道遭遇滑铁卢，这里也记录下：

abc三列，建立了一个复合索引key1 (a,b,c)，where 条件a=2 and b>2 and c=2 能用到复合索引的哪部分，为什么，如果从数据结构上来解释呢？

首先key1复合索引相当于建立了a，ab，abc三个索引，根据上一篇range优化里说的，对于btree索引，只要是使用了=，<=>，或者is null操作符，MySQL就尝试使用额外的索引键来确定区间，如果操作符是>，<，>=，<=，!=，<>，between或like，mysql也会使用它，但不会再考虑其他的索引键。所以很容易就得出，上面的where条件用到的索引部分是ab这个部分，这个我们可以通过执行计划看看使用到的索引的key的长度。

举个栗子:

create table test (
    id int unsigned auto_increment,
    a int not null default 0,
    b int not null default 0,
    c int not null default 0,
    d int not null default 0,
    e int not null default 0,
    primary key (id),
    key key1 (a,b,c)
)engine=innodb default charset=utf8;

//加了点数据
mysql> select *from test ;
+----+---+---+---+---+---+
| id | a | b | c | d | e |
+----+---+---+---+---+---+
|  1 | 0 | 0 | 1 | 0 | 0 |
|  2 | 1 | 1 | 0 | 0 | 0 |
|  3 | 1 | 0 | 1 | 1 | 0 |
|  4 | 1 | 2 | 3 | 4 | 0 |
|  5 | 1 | 3 | 2 | 1 | 0 |
|  6 | 2 | 6 | 4 | 1 | 0 |
+----+---+---+---+---+---

mysql> explain select *from test where a=1 and b>1 and c=2;
+----+-------------+-------+-------+---------------+------+---------+------+------+-----------------------+
| id | select_type | table | type  | possible_keys | key  | key_len | ref  | rows | Extra                 |
+----+-------------+-------+-------+---------------+------+---------+------+------+-----------------------+
|  1 | SIMPLE      | test  | range | key1          | key1 | 8       | NULL |    2 | Using index condition |
+----+-------------+-------+-------+---------------+------+---------+------+------+-----------------------+
1 row in set (0.01 sec)

可以看到上面使用到的索引是key1这个索引，索引的长度是8，我们计算下这个索引的长度，abc都是int类型的占用4个字节，索引的长度是8个字节，可以证明只用了ab这个部分的索引。

使用optimizer_trace跟踪查看（我截取了优化器选择的部分，可以看到选择的正是ab这部分）

"analyzing_range_alternatives": {
  "range_scan_alternatives": [
    {
      "index": "key1",
      "ranges": [
        "1 <= a <= 1 AND 1 < b"
      ],
      "index_dives_for_eq_ranges": true,
      "rowid_ordered": false,
      "using_mrr": false,
      "index_only": false,
      "rows": 2,
      "cost": 3.41,
      "chosen": true
    }
  ],
  "analyzing_roworder_intersect": {
    "usable": false,
    "cause": "too_few_roworder_scans"
  }
},
"chosen_range_access_summary": {
  "range_access_plan": {
    "type": "range_scan",
    "index": "key1",
    "rows": 2,
    "ranges": [
      "1 <= a <= 1 AND 1 < b" //这里可以看到选了ab
    ]
  },
  "rows_for_plan": 2,
  "cost_for_plan": 3.41,
  "chosen": true
}
{
  "attaching_conditions_to_tables":{
    "original_condition":"((`test`.`c` = 2) and (`test`.`a` = 1) and (`test`.`b` > 1))",
    "attached_conditions_computation":[
      {
        "access_type_changed":{
          "table":"`test`",
          "index":"key1",
          "old_type":"ref",
          "new_type":"range", //这是决定了用那种扫描类型
          "cause":"uses_more_keyparts"
        }
      }
    ],
    "attached_conditions_summary":[
      {
        "table":"`test`",
        "attached":"((`test`.`c` = 2) and (`test`.`a` = 1) and (`test`.`b` > 1))"
      }
    ]
  }
}

结果已经知道了，可心里不免有点疑问，那为什么不会用到c那部分呢？对于联合索引来说abc肯定是最优的。我个人的理解是这样的：

在btree多列索引中，abc,b要是有了范围查找，c就不会使用索引了，相当于只使用了ab两个部分的索引，为什么呢，是因为在查找节点的时，先匹配a找到a后再去匹配b，但b是一个范围查找，并不能定位到下一级的索引节点，那会去匹配b的范围，但b的范围匹配出来可能存在多个节点中，这每个节点并不一定是有序的，如果要再去匹配c，用上c的部分索引，那就不是顺序io了，成了随机io浪费性能，所以mysql才基于这个原因，只会用上ab这部分索引，就是因为他们是顺序io，读取速度快，io次数少。