MySQL案例-不同寻常的[ERROR]Can't create a new thread (errno 11)

-------------------------------------------------------------------------------------------------正文---------------------------------------------------------------------------------------------------------------

场景:
MySQL-5.7.17, 程序端报异常

点击(此处)折叠或打开

  1. OperationalError: (1135, "Can't create a new thread (errno 11); if you are not out of available memory, you can consult the manual for a possible OS-dependent bug")

结论:
肯定不是files open limit和innodb_open_files的问题~
PS: 是的话, 就没有这篇博客了~
先卖个关子~\(≧▽≦)/~


不同寻常的地方:
程序在创建约32300+的数据库连接之后, 必定会出现连接异常, 清理掉部分连接以后, 会恢复正常, 但是再次到达32300+的数量之后, 问题还是会出现;

在测试环境必现, 5.7.17和5.7.19都有这个问题;



分析:
首先考虑到的就是files open limit和innodb_open_files的问题, 但是试过了, 都没有用;

把内存相关的设置, files open之类的设置进行调整之后, 这个问题依然没有解决, 感觉问题可能并没有出在MySQL, 难道是系统层面的一些限制or bug?

遂编译了5.7.19版本的MySQL, 打开了debug, 并写了一个简单的python脚本来hold 32300+的数据库连接;


点击(此处)折叠或打开

  1. import MySQLdb
  2. import sys
  3. import time

  4. loop = 10000
  5. conn_list = []

  6. def my_conn(ip) :
  7.     return MySQLdb.connect(host=ip
  8.             ,port=3306
  9.             ,user='temp'
  10.             ,passwd='test')

  11. def conn_test(ip) :
  12.     for i in range(1,loop) :
  13.         conn = my_conn(ip)
  14.         conn_list.append(conn)
  15.     num = 0
  16.     while(True) :
  17.         print num
  18.         if num == loop - 1 :
  19.             num = 0
  20.             time.sleep(10)
  21.         num = num + 1
  22.         time.sleep(1)
  23.     print rst
  24.     return True


  25. if __name__ == '__main__' :
  26.     conn_test("192.168.1.1")

多次尝试之下, 确认在创建到第32373个连接时一定会报错, 那么看一下mysql trace:

这是出问题的时候的信息:

点击(此处)折叠或打开

  1. T@0: >Per_thread_connection_handler::add_connection
  2. T@0: | >my_raw_malloc
  3. T@0: | | my: size: 232 my_flags: 16
  4. T@0: | | exit: ptr: 0x44046ec0
  5. T@0: | <my_raw_malloc 219
  6. T@0: | >my_free
  7. T@0: | | my: ptr: 0x44046ec0
  8. T@0: | <my_free 292
  9. T@0: >Per_thread_connection_handler::add_connection

这是正常的时候:

点击(此处)折叠或打开

  1. T@0: >Per_thread_connection_handler::add_connection
  2. T@0: | >my_raw_malloc
  3. T@0: | | my: size: 232 my_flags: 16
  4. T@0: | | exit: ptr: 0x4238d9c0
  5. T@0: | <my_raw_malloc 219
  6. T@0: | info: Thread created
  7. T@0: <Per_thread_connection_handler::add_connection 425

那么确实如错误信息描述一般, mysql在创建新连接的时候遇到了问题,
具体的来说, 是在申请完mysql创建connection需要的内存之后, 发生了问题, 所以释放掉了这一部分内存, 并抛出异常;
那么看看在这个方法里面, mysql在干嘛:

点击(此处)折叠或打开

  1. connection_handler_per_thread.cc

  2. bool Per_thread_connection_handler::add_connection(Channel_info* channel_info)
  3. {
  4.   int error= 0;
  5.   my_thread_handle id;

  6.   DBUG_ENTER("Per_thread_connection_handler::add_connection");

  7.   // Simulate thread creation for test case before we check thread cache
  8.   DBUG_EXECUTE_IF("fail_thread_create", error= 1; goto handle_error;);

  9.   if (!check_idle_thread_and_enqueue_connection(channel_info))
  10.     DBUG_RETURN(false);

  11.   /*
  12.     There are no idle threads avaliable to take up the new
  13.     connection. Create a new thread to handle the connection
  14.   */
  15.   channel_info->set_prior_thr_create_utime();
  16.   error= mysql_thread_create(key_thread_one_connection, &id,     //<----在这里, error不是0
  17.                              &connection_attrib,
  18.                              handle_connection,
  19.                              (void*) channel_info);
  20. #ifndef DBUG_OFF
  21. handle_error:
  22. #endif // !DBUG_OFF

  23.   if (error)                                                     //<----所以进入了这个if逻辑
  24.   {
  25.     connection_errors_internal++;
  26.     if (!create_thd_err_log_throttle.log())
  27.       sql_print_error("Can't create thread to handle new connection(errno= %d)",
  28.                       error);
  29.     channel_info->send_error_and_close_channel(ER_CANT_CREATE_THREAD,
  30.                                                error, true);
  31.     Connection_handler_manager::dec_connection_count();
  32.     DBUG_RETURN(true);
  33.   }

  34.   Global_THD_manager::get_instance()->inc_thread_created();
  35.   DBUG_PRINT("info",("Thread created"));
  36.   DBUG_RETURN(false);
  37. }


既然是mysql_thread_create出了问题, 那继续往下追踪, 通过各种def的转换, 最终到了这段代码;
PS: trace中的my_free是一个很重要的信息, 通过这个信息可以确认到并不是MySQL自身的代码出现了问题~


点击(此处)折叠或打开

  1. my_thread.c

  2. int my_thread_create(my_thread_handle *thread, const my_thread_attr_t *attr,
  3.                      my_start_routine func, void *arg)
  4. {
  5. #ifndef _WIN32
  6.   return pthread_create(&thread->thread, attr, func, arg);
  7. #else
  8.   ......
  9. }

可以发现, 从add_connection开始, 一路调用各种方法, 最终error的返回值是由pthread_create决定的;

而出问题的这个方法, 其实是glibc的函数, 就算在gdb中进行调试, 也无法看到具体的代码, 如果hold住32000+连接后再用gdb调试, 那效率真是....
(╯‵□′)╯︵┻━┻

于是在google搜了一下pthread_create方法和Can't create a new thread的关键字, 找到一些信息, 大体上的说法就是一些Linux系统层面的参数会限制可创建的线程数;

似乎是有些眉目了, 于是仔细找了一圈, 发现一个比较早的讨论帖, 正好是在讨论不能创建32K连接数的问题;
相关链接:
https://listman.redhat.com/archives/phil-list/2003-August/msg00005.html
https://listman.redhat.com/archives/phil-list/2003-August/msg00010.html
https://listman.redhat.com/archives/phil-list/2003-August/msg00025.html

把讨论的内容贴过来:

点击(此处)折叠或打开

  1. Hi,
    
    I was using the 'thread-limit' program from http://people.redhat.com/alikins/tuning_utils/thread-limit.c to test the
    number of threads it could create.  It seems that it was always hitting
    some limit at 32K threads (cannot create thread 32762, to be exact). The
    error is ENOMEM.  Here's the kernel/ulimit settings,
    
    /proc/sys/kernel/pid_max 300000
    /proc/sys/kernel/threads-max 100000
    
    ulimit -a
    core file size        (blocks, -c) 0
    data seg size         (kbytes, -d) unlimited
    file size             (blocks, -f) unlimited
    max locked memory     (kbytes, -l) unlimited
    max memory size       (kbytes, -m) unlimited
    open files                    (-n) 100000
    pipe size          (512 bytes, -p) 8
    stack size            (kbytes, -s) 32
    cpu time             (seconds, -t) unlimited
    max user processes            (-u) 100000
    virtual memory        (kbytes, -v) unlimited
    
    It gave the same result on both a Debian 3 box with NPTL 0.56 compiled
    with gcc 3.4 CVS and GlibC CVS, kernel 2.5.70, and vanilla Redhat 9.
    
    I know I must be missing something because 100K threads with NPTL was
    reported.  Thanks.
    
    -- 
    Feng Zhou
    Graduate Student,
    CS Division, U.C. Berkeley http://www.cs.berkeley.edu/~zf/



点击(此处)折叠或打开

  1. 
    
    
    On Fri, 7 Aug 2003, Feng Zhou wrote:
    
    > I was using the 'thread-limit' program from
    > http://people.redhat.com/alikins/tuning_utils/thread-limit.c to test the
    > number of threads it could create.  It seems that it was always hitting
    > some limit at 32K threads (cannot create thread 32762, to be exact). The
    > error is ENOMEM.  Here's the kernel/ulimit settings,
    
    what is the current value of your /proc/sys/vm/max_map_count tunable? Can
    you max out RAM if you double the current limit?
    
    	Ingo


点击(此处)折叠或打开

  1. Yes, that's it.  Actually I have to change MAX_MAP_COUNT in
    include/linux/sched.h and recompile the 2.5.70 kernel because it doesn't
    have such a sysctl file.  After doubling the value from 65536 to 131072,
    I can create 65530 thread before it fails with ENOMEM.  
    
    BTW, the system begins thrashing at around 63000 threads, where resident
    set of the process is around 250MB.  This makes sense to me because each
    empty thread actually uses the first 4K page in its 16K stack.  Given
    the system has 1GB of physical memory.  The kernel memory each thread
    uses seems to be around 12KB ((1GB-250MB)/63000).
    
    - Feng Zhou
    
    On Mon, 2003-08-11 at 02:29, Ingo Molnar wrote:
    > On Fri, 7 Aug 2003, Feng Zhou wrote:
    > 
    > > I was using the 'thread-limit' program from
    > > http://people.redhat.com/alikins/tuning_utils/thread-limit.c to test the
    > > number of threads it could create.  It seems that it was always hitting
    > > some limit at 32K threads (cannot create thread 32762, to be exact). The
    > > error is ENOMEM.  Here's the kernel/ulimit settings,
    > 
    > what is the current value of your /proc/sys/vm/max_map_count tunable? Can
    > you max out RAM if you double the current limit?
    > 
    > 	Ingo
    > 

事实上, 就如这个讨论所言,  在调整了
max_map_count的设置之后, mysql也可以创建超过32000+的连接了!

那么这个参数调高以后, 有什么影响呢?
这篇文章有提到这个参数的影响: https://www.novell.com/support/kb/doc.php?id=7000830

截取其中的重要部分:

点击(此处)折叠或打开

  1. How are they affected? Well, since there will be more elements in the VM red-black tree, all operations on the VMA will take longer. The slow-down of most operations is logarithmic, e.g. further mmap's, munmap's et al. as well as handling page faults (both major and minor). Some operations will slow down linearly, e.g. copying the VMAs when a new process is forked.

  2. In short, there is absolutely no impact on memory footprint or performance for processes which use the same number of maps. On the other hand, processes where one of the memory mapping functions would have failed with ENOMEM because of hitting the limit, will now be allowed to consume the additional kernel memory with all the implications described above.


结论:
所以调高vm.max_map_count之后, 程序就不会再抛出异常了~
从实际结果测试来看, 默认的65535能支持32000+的链接, 翻倍的话, 应该能支持双倍的链接上限





上一篇:《MATLAB图像处理375例》——1.4 MATLAB工作环境


下一篇:一个BUG的发现、定位和解决