InnoDB下binlog和redo log的写入机制

binlog的参数:

sync_binlog:
Scope:Global、Dynamic:Yes、Default Value (>= 5.7.7) 1、Default Value (<= 5.7.6) 0、Minimum Value 0、Maximum Value 4294967295
sync_binlog=0: Disables synchronization of the binary log to disk by the MySQL server.Instead, the MySQL server relies on the operating system to flush the binary log to disk from time to time as it does for any other file. This setting provides the best performance, but in the event of a power failure or operating system crash, it is possible that the server has committed transactions that have not been synchronized to the binary log.
sync_binlog=1: Enables synchronization of the binary log to disk before transactions are committed. This is the safest setting but can have a negative impact on performance due to the increased number of disk writes. In the event of a power failure or operating system crash,transactions that are missing from the binary log are only in a prepared state. This permits the automatic recovery routine to roll back the transactions, which guarantees that no transaction is lost from the binary log.
sync_binlog=N, where N is a value other than 0 or 1: The binary log is synchronized to disk after N binary log commit groups have been collected. In the event of a power failure or operating system crash, it is possible that the server has committed transactions that have not been flushed to the binary log. This setting can have a negative impact on performance due to the increased number of disk writes. A higher value improves performance, but with an increased risk of data loss.

binlog_group_commit_sync_delay:
Scope:Global、Dynamic:Yes、Default Value 0、Minimum Value 0、Maximum Value 1000000
Controls how many microseconds the binary log commit waits before synchronizing the binary log file to disk. By default binlog_group_commit_sync_delay is set to 0, meaning that there is no delay. 
Setting binlog_group_commit_sync_delay to a microsecond delay enables more transactions to be synchronized together to disk at once, reducing the overall time to commit a group of transactions because the larger groups require fewer time units per group.

When sync_binlog=0 or sync_binlog=1 is set, the delay specified by binlog_group_commit_sync_delay is applied for every binary log commit group before synchronization (or in the case of sync_binlog=0, before proceeding). 
When sync_binlog is set to a value n greater than 1, the delay is applied after every n binary log commit groups.

binlog_group_commit_sync_no_delay_count:
Scope:Global、Dynamic:Yes、Default Value 0、Minimum Value 0、Maximum Value 1000000
The maximum number of transactions to wait for before aborting the current delay as specified by binlog_group_commit_sync_delay

关于binlog的写入机制:

1.一个事务的binlog是不能被拆开的,所以要保证binlog一次性写入到binlog file中
2.每个线程对应一个binlog cache,所占内存大小由参数binlog_cache_size控制,超过该参数设定值时就需要使用tmpfile暂存到磁盘
3.binlog写入顺序:binlog写入binlog cache中---->binlog从binlog cache写入到page cache(write)---->binlog从page cache持久化到磁盘上(flush),其中write由Mysql Server执行,flush由操作系统调用fsync()执行

sync_binlog的解释:
sync_binlog=0:每次事务提交只write,不flush
sync_binlog=1:每次事务提交都进行write和flush
sync_binlog=N:每次事务提交只write,N次事务提交以后执行flush

考虑到Mysql Server如果Crash,那些还没执行flush的binlog都会丢失,所以sync_binlog<>1都会存在数据丢失的风险

redo log的参数:

innodb_flush_log_at_trx_commit:
Scope:Global、Dynamic:Yes、Default Value 1
The default setting of 1 is required for full ACID compliance. Logs are written and flushed to disk at each transaction commit.
With a setting of 0, logs are written and flushed to disk once per second. Transactions for which logs have not been flushed can be lost in a crash.
With a setting of 2, logs are written after each transaction commit and flushed to disk once per second. Transactions for which logs have not been flushed can be lost in a crash.

For settings 0 and 2, once-per-second flushing is not 100% guaranteed. Flushing may occur more frequently due to DDL changes and other internal InnoDB activities that cause logs to be flushed independently of the innodb_flush_log_at_trx_commit setting, and sometimes less frequently due to scheduling issues. 
If logs are flushed once per second, up to one second of transactions can be lost in a crash. If logs are flushed more or less frequently than once per second, the amount of transactions that can be lost varies accordingly.

Log flushing frequency is controlled by innodb_flush_log_at_timeout, which allows you to set log flushing frequency to N seconds (where N is 1 ... 2700, with a default value of 1).However, any mysqld process crash can erase up to N seconds of transactions.

innodb_flush_log_at_timeout:
Scope:Global、Dynamic:Yes、Default Value 1、Minimum Value 1、Maximum Value 2700
Write and flush the logs every N seconds. innodb_flush_log_at_timeout allows the timeout period between flushes to be increased in order to reduce flushing and avoid impacting performance of binary log group commit. 
The default setting for innodb_flush_log_at_timeout is once per second.

关于redo log的写入机制:

1.区别于binlog每个线程一个binlog cache,redo log都是统一写到redo log buffer中的
2.redo log写入顺序:redo log写入redo log buffer中---->redo log从redo log buffer写入到page cache(write)---->redo log从page cache持久化到磁盘上(flush),其中write由Mysql Server执行,flush由操作系统调用fsync()执行
3.InnoDB有个后台线程,每隔一秒,就会把redo log buffer中的日志,调用write写到page cache,然后再flush到磁盘上
4.当redo log buffer 占用的空间即将达到innodb_log_buffer_size一半的时候,后台线程会主动写盘(write)
5.innodb_flush_log_at_trx_commit=1时,由于redo log都是存在redo log buffer里面的,所以会存在A事务尚未commit,但其redo log已经被B事务的commit操作"顺带" write and flush了
6.innodb_flush_log_at_trx_commit=1时,事务在prepare时,redo log也需要write and flush(准确说是要在binlog前先落盘,以保证如果binlog是全的,那么redo log就一定是全的)

innodb_flush_log_at_trx_commit的解释:
innodb_flush_log_at_trx_commit=0:redo log只写到redo log buffer里,write和flush都交由InnoDB的后台进程处理,事务提交时不做处理
innodb_flush_log_at_trx_commit=1:每次事务提交都进行write和flush
innodb_flush_log_at_trx_commit=2:每次事务提交时只进行write,flush交由InnoDB的后台进程处理

两阶段提交:
写redo log,事务prepare--->写binlog--->事务commit
两阶段提交细节拆分:
写redo log(write),事务prepare--->写binlog(write)--->redo log flush--->binlog flush--->事务commit,写redo log(write)

redo log组提交:
利用LSN,在redo log write和flush之间有段binlog write的时间,这段时间内的完成redo log write可以一起fsync,从而达到组提交的效果

binlog组提交:
1.在binlog的write和flush之间加入了redo log flush的操作,以此来拉长binlog write和flush之前的时间间隔,在这期间可以收集更多事务的binlog,然后一起进行flush
2.利用binlog_group_commit_sync_delay和binlog_group_commit_sync_no_delay_count来实现group提交:
    binlog_group_commit_sync_delay表示延迟多少微秒后才调用fsync
    binlog_group_commit_sync_no_delay_count表示累积多少次后才调用sync
二者是或的关系,满足其一即可调用fsync

Mysql crash recovery时两种日志的协作:

1.redo log里有commit记录则事务完成,无需处理
2.redo log里无commi记录,则通过binlog的完整性判断,binlog完整的话事务无需处理,binlog不完整则代表事务未完成,crash recovery时需要回滚
(因为根据2PC,redo log prepare阶段的redo log落盘的时间节点是在binlog落盘前,也就是binlog完全落盘则表明redo log prepare阶段的redo log已经落盘完成)

上一篇:Mysql 查询缓存


下一篇:PHP ob_flush();在IE10中不起作用