序言

学习了慕课网上的MySQL数据库优化课程，链接为：https://www.imooc.com/video/3688，记录下自己学习过程中的收获。

数据库优化的目的
- 避免出现网页访问错误
  - 由于数据库连接超时产生页面5XX的错误
  - 由于慢查询造成页面无法加载
  - 由于阻塞造成数据无法提交
- 增加数据库的稳定性
  - 许多数据库问题都是由于低效的查询引起的
- 优化用户体验
  - 流畅页面的访问速度
  - 良好的网站功能体验
数据库优化可以从哪几方面进行：(成本从低到高，效果从高到底)
- SQL及索引优化
- 数据库表结构优化
- 系统配置优化
- 硬件优化

SQL语句优化

如何发现有问题的SQL：使用MySQL慢查询日志对效率有问题的SQL进行监控

1
2
3
4
5
6
7 mysql> show variables like 'slow_query_log';
+----------------+-------+
| Variable_name  | Value |
+----------------+-------+
| slow_query_log | ON    |
+----------------+-------+
1 row in set, 1 warning (0.00 sec)

1
2
3
4
5
6
7
8
9 mysql> show variables like 'slow%';
+---------------------+--------------------------+
| Variable_name       | Value                    |
+---------------------+--------------------------+
| slow_launch_time    | 2                        |
| slow_query_log      | ON                       |
| slow_query_log_file | DESKTOP-EKS47TB-slow.log |
+---------------------+--------------------------+
3 rows in set, 1 warning (0.00 sec)

慢查日志包含的内容：
- 执行SQL的主机信息
- SQL执行信息
- SQL执行时间
- SQL的内容
慢查询日志分析工具
- mysqldumpslow
- pt-query-digest
  - 查询次数多且每次查询占用时间长的SQL
  - IO大的SQL
  - 未命中索引的SQL
explain查询和分析查询语句：

mysql> explain select customer_id, first_name, last_name from customer;
+----+-------------+----------+------------+------+---------------+------+---------+------+------+----------+-------+
| id | select_type | table    | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra |
+----+-------------+----------+------------+------+---------------+------+---------+------+------+----------+-------+
|  1 | SIMPLE      | customer | NULL       | ALL  | NULL          | NULL | NULL    | NULL |  599 |   100.00 | NULL  |
+----+-------------+----------+------------+------+---------------+------+---------+------+------+----------+-------+
1 row in set, 1 warning (0.00 sec)

table 数据来自哪张表
type 连接使用的类型，最好到最坏的链接类型分别为const,eq_reg,ref,range,index,ALL.
possible_keys：显示可能应用在这张表上的索引
key：实际使用的索引
key_len：使用的索引长度
ref：显示索引哪一列被使用了，如果可能的话，是一个常数
rows：Mysql认为必须检查的用来返回请求的行数
extra
- Using filesort：查询需要优化，MySQL需要进行额外的步骤来发现如何对返回的行排序。它根据连接类型以及存储排序的键值和匹配条件的全部行的行指针来排序全部行。
- Using temporary：需要优化，MySQL此时创建了一个临时表来存储结果，这通常发生在对不同的列集进行ORDER by上，而不是group by。

Count()和Max()优化方法

查询最后支付时间：

mysql> explain select max(payment_date) from payment \G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: payment
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 16086
     filtered: 100.00
        Extra: NULL
1 row in set, 1 warning (0.00 sec)

在payment字段上创建索引：

mysql> create index idx_paydate on payment(payment_date);
Query OK, 0 rows affected (0.06 sec)
Records: 0  Duplicates: 0  Warnings: 0
mysql> explain select max(payment_date) from payment \G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: NULL
   partitions: NULL
         type: NULL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: NULL
     filtered: NULL
        Extra: Select tables optimized away
1 row in set, 1 warning (0.00 sec)

在一条SQL中同时查出2006年和2007年电影的数量-优化count()函数

错误实例：

mysql> select count(release_year='2006' or release_year='2007') from film;
+---------------------------------------------------+
| count(release_year='2006' or release_year='2007') |
+---------------------------------------------------+
|                                              1000 |
+---------------------------------------------------+
1 row in set (0.01 sec)

正确的：

mysql> select count(release_year='2006' or null) as '2006电影数量', count(release_year='2007' or null) as '2007电影数量' from film;
+--------------+--------------+
| 2006电影数量 | 2007电影数量 |
+--------------+--------------+
|         1000 |            0 |
+--------------+--------------+
1 row in set (0.00 sec)

‘count(*)计算空值，count(id)不计算空值’

子查询优化
- 通常情况下，需要把子查询优化为join查询，但在优化时要注意关联键是否有一对多的关系，要注意重复数据。

优化group by查询,使用了临时表

mysql> explain select actor.first_name, actor.last_name, COUNT(*) from sakila.film_actor inner join sakila.actor using(actor_id) group by film_actor.actor_id\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: actor
   partitions: NULL
         type: ALL
possible_keys: PRIMARY
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 200
     filtered: 100.00
        Extra: Using temporary; Using filesort
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: film_actor
   partitions: NULL
         type: ref
possible_keys: PRIMARY,idx_fk_film_id
          key: PRIMARY
      key_len: 2
          ref: sakila.actor.actor_id
         rows: 27
     filtered: 100.00
        Extra: Using index
2 rows in set, 1 warning (0.00 sec)

优化之后：

mysql> explain select actor.first_name, actor.last_name, c.cnt from sakila.actor inner join (select actor_id, COUNT(*) as cnt from sakila.film_actor group by actor_id) as c using(actor_id)\G
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: actor
   partitions: NULL
         type: ALL
possible_keys: PRIMARY
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 200
     filtered: 100.00
        Extra: NULL
*************************** 2. row ***************************
           id: 1
  select_type: PRIMARY
        table: <derived2>
   partitions: NULL
         type: ref
possible_keys: <auto_key0>
          key: <auto_key0>
      key_len: 2
          ref: sakila.actor.actor_id
         rows: 27
     filtered: 100.00
        Extra: NULL
*************************** 3. row ***************************
           id: 2
  select_type: DERIVED
        table: film_actor
   partitions: NULL
         type: index
possible_keys: PRIMARY,idx_fk_film_id
          key: PRIMARY
      key_len: 4
          ref: NULL
         rows: 5462
     filtered: 100.00
        Extra: Using index
3 rows in set, 1 warning (0.00 sec)

优化Limit查询：limit常用于分页处理，会伴随着order by从句使用，因此大多数时候会使用Filesorts这样会造成大量的IO问题。
执行结果：

mysql> select film_id, description from sakila.film order by title limit 50, 5;
+---------+---------------------------------------------------------------------------------------------------------------------------------+
| film_id | description                                                                                                                     |
+---------+---------------------------------------------------------------------------------------------------------------------------------+
|      51 | A Insightful Panorama of a Forensic Psychologist And a Mad Cow who must Build a Mad Scientist in The First Manned Space Station |
|      52 | A Thrilling Documentary of a Composer And a Monkey who must Find a Feminist in California                                       |
|      53 | A Epic Drama of a Madman And a Cat who must Face a A Shark in An Abandoned Amusement Park                                       |
|      54 | A Awe-Inspiring Drama of a Car And a Pastry Chef who must Chase a Crocodile in The First Manned Space Station                   |
|      55 | A Awe-Inspiring Story of a Feminist And a Cat who must Conquer a Dog in A Monastery                                             |
+---------+---------------------------------------------------------------------------------------------------------------------------------+
5 rows in set (0.00 sec)

执行计划：

mysql> explain select film_id, description from sakila.film order by title limit 50, 5\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: film
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 1000
     filtered: 100.00
        Extra: Using filesort
1 row in set, 1 warning (0.00 sec)

可以看到使用了filesort方式，需要优化。

使用有索引的列或者主键进行order by操作

mysql> explain select film_id, description from sakila.film order by film_id limit 50, 5\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: film
   partitions: NULL
         type: index
possible_keys: NULL
          key: PRIMARY
      key_len: 2
          ref: NULL
         rows: 55
     filtered: 100.00
        Extra: NULL
1 row in set, 1 warning (0.00 sec)

记录上次返回的主键，在下一次查询时使用主键过滤

mysql> explain select film_id, description from sakila.film where film_id>55 and film_id<=60 order by film_id limit 1, 5\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: film
   partitions: NULL
         type: range
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 2
          ref: NULL
         rows: 5     
     filtered: 100.00
        Extra: Using where
1 row in set, 1 warning (0.00 sec)

索引优化

如何选择合适的列创建索引

where从句，group by从句，order by从句，on从句中出现的列
索引字段越小越好
离散度大的列放到索引列的前面:下面的例子中，customer_id需要放在staff_id前面。

mysql> desc payment;
+--------------+----------------------+------+-----+-------------------+-----------------------------+
| Field        | Type                 | Null | Key | Default           | Extra                       |
+--------------+----------------------+------+-----+-------------------+-----------------------------+
| payment_id   | smallint(5) unsigned | NO   | PRI | NULL              | auto_increment              |
| customer_id  | smallint(5) unsigned | NO   | MUL | NULL              |                             |
| staff_id     | tinyint(3) unsigned  | NO   | MUL | NULL              |                             |
| rental_id    | int(11)              | YES  | MUL | NULL              |                             |
| amount       | decimal(5,2)         | NO   |     | NULL              |                             |
| payment_date | datetime             | NO   | MUL | NULL              |                             |
| last_update  | timestamp            | NO   |     | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |
+--------------+----------------------+------+-----+-------------------+-----------------------------+
7 rows in set (0.00 sec)
mysql> select count(distinct customer_id), count(distinct staff_id) from payment;
+-----------------------------+--------------------------+
| count(distinct customer_id) | count(distinct staff_id) |
+-----------------------------+--------------------------+
|                         599 |                        2 |
+-----------------------------+--------------------------+
1 row in set (0.01 sec)

索引的维护及优化–重复及冗余索引

重复索引是指相同的列以相同的顺序建立的同类型的索引。如：
冗余索引指的是多个索引的前缀列相同，或是在联合索引中包含了主键。

查询方法:

mysql> select a.table_schema as '数据名',
a.table_name as '表名',
a.index_name as '索引1',
a.column_name as '重复列名'
from statistics a join statistics b
on a.table_schema=b.table_schema
and a.table_name = b.table_name
and a.seq_in_index = b.seq_in_index
and a.column_name = b. column_name
where a.seq_in_index=1 and a.index_name <> b.index_name\G

删除不使用的索引，MySQL只能查询慢查日志来分析。

数据库结构优化

选择合适的数据类型

使用可以存下你的数据的最小的数据类型
使用简单的数据类型，int要比varcahr类型在MySQL上面处理简单。
尽可能的使用not null定义字段。
尽量少用text类型，非用不可时最好考虑分表。

表的范式化和反范式话

范式话是指数据库设计的规范，目前说的范式话一般是指第三设计范式，也就是要求数据表中不存在非关键字段对任意候选关键字段的函数依赖则符合第三范式。
- 不符合范式的问题：解决方法为对表进行拆分。
  - 数据冗余
  - 数据的插入异常
  - 数据的更新异常
  - 数据的删除异常
反范式化是指为了查询效率的考虑把原来符合第三范式的表的适当增加冗余，以达到优化查询效率的目的，反范式化是一种以空间来换取时间的操作。

表的垂直拆分

垂直拆分指的是把原有一个很多列的表拆分成多个表，这解决了表的宽度问题。拆分原则如下：

不常用字段放入到一个表中
把大字段独立存放到一个表中
经常一起使用的字段放到一个表中

表的水平拆分

水平拆分是为了解决表的数据量过大的问题，水平拆分的表每一个表的结构都是完全一致的。

常用拆分方法为：对id进行hash运算，如果要拆分为100个子表，则对100取余，根据不同的结果将其存放到对应的表中。存在的挑战包括跨分区表进行数据查询，统计及后台报表操作。

系统配置优化

操作系统配置优化

数据库是基于操作系统的，目前大多数MySQL都是安装在Linux上，所以对于操作系统的一些参数配置也会影响到MySQL的性能。

tcp支持的队列数
减少断开连接时，资源回收
打开文件数的限制，ulimit -a查询各位限制。
关闭iptables，selinux等防火墙

Mysql配置文件优化

MySQL可以通过启动时指定配置参数和使用配置文件两种方式进行配置，在大多数情况下，配置文件位于/etc/my.cnf或是/etc/mysql/my.cnf。如果存在多个位置存在配置文件，后面的会对前面的进行覆盖。

innodb_buffer_pool_size：配置Innodb的缓冲池，如果数据库中只有Innodb表，则推荐配置量为总内存的75%。
innodb_buffer_pool_instance：可以控制缓冲池的个数，默认一个
innodb_log_buffer_size：Innodb log 缓冲的大小，由于日志最长每秒刷新一次，一般不大
innodb_flush_log_at_trx_commit：关键参数，数据库多久将变更写入磁盘，对Innodb的IO效率影响很大。默认为1，每次提交都会将变更刷新到磁盘。0每次提交不刷新，每1s刷新到磁盘一次。2将每次提交刷新到缓冲区，每1s将缓冲区刷新到磁盘。一般设置为2。
innodb_read_io_threads,innodb_write_io_threads：Innodb读写的IO进程数，默认为4
innode_file_per_table：关键参数，控制Innodb每一个表使用独立的表空间。默认为OFF，也就是所有的表都会建立在共享表空间中。
innodb_stats_on_metadata：决定MySQL什么情况下会刷新innodb表的统计信息。

第三方配置工具，Percona Tools

服务器硬件优化

CPU：单核频率更高，MySQL对CPU服务器
磁盘IO：RAID
- RAID0：条带，多个磁盘链接成一个硬盘使用，IO最好，但是一个损坏，则所有数据丢失
- RAID1：镜像，至少两个磁盘，每组磁盘存储的数据相同。
- RAID5：多个硬盘合并成一个逻辑盘使用，数据读写时会建立奇偶校验信息，并且奇偶校验信息和相对应的数据分别存储于不同的磁盘上。当RAID5的一个磁盘数据发生损坏后，利用剩下的数据的相应的奇偶校验信息取恢复被损坏的数据。
- RAID1+0：RAID0和RAID1结合使用，数据库建议使用这个级别。
SNA和NAT是否使用数据库：
- 常用于高可用解决方案
- 顺序读写效率很高，但是随机读写很差
- 数据库随机读写的比率很高