Debugging sleeping connections with MySQL

Have you ever seen connection in the SHOW PROCESSLIST output which is in “Sleep” state for a long time and you have no idea why this would happen ?

I see if frequently with web applications and it is often indication of trouble. Not only it means you may run out of MySQL connections quicker than you expected but it also frequently indicates serious problems in the application. If you do not use persistent connections and you have connection in Sleep stage for 600 seconds what could it be ? It may mean some of your pages take that long to generate (or might be the code simply gets into the tight loop and page never gets generated) it also could mean some of external Web Services are slow or not available and you’re not dealing with timeouts properly. Or may be you have several connections to MySQL server and right now running query which takes that long ? In any case it is something frequently worth looking at.

First task is to find to which process the connection belongs. Using different user names for different application is a good practice however it will not tell you which of apache children is handling request in question. If you just want to fix it, ie by restarting apache it is enough but if you want to figure our why it is happening you need more info.

You my notice in the “Host” filed of SHOW PROCESSLIST output not only host but also port is specified, showing you something like “192.168.1.70:58555″ This port can be used to identify the process which owns connection in question:

[root@w1 ~]# netstat -ntp | grep :45384
tcp        0      0 192.168.1.70:45384          192.168.1.82:3306           ESTABLISHED 28540/php-cgi

As you can see in this case we can find php-cgi is holding connection in question (this is lighttpd based system with fastcgi)

Now you know the process and you can use your favorite tools to check what that process is doing.

[root@w1 ~]# netstat -ntp | grep 28540
tcp        0      0 192.168.1.70:58555          192.168.1.90:11211          ESTABLISHED 28540/php-cgi
tcp        0      0 192.168.1.70:52711          192.168.1.88:8080           ESTABLISHED 28540/php-cgi
tcp        0      0 192.168.1.70:45384          192.168.1.82:3306           ESTABLISHED 28540/php-cgi
tcp        0      0 192.168.1.70:45399          192.168.1.82:3306           ESTABLISHED 28540/php-cgi
tcp        0      0 192.168.1.70:45407          192.168.1.82:3306           ESTABLISHED 28540/php-cgi
tcp        0      0 192.168.1.70:45408          192.168.1.82:3306           ESTABLISHED 28540/php-cgi
tcp        0      0 192.168.1.70:35556          192.168.1.92:11211          ESTABLISHED 28540/php-cgi

Using same netstat command and filtering on the PID we can find which connections does this process have. Here you can see it has couple of memcached connections. Few MySQL connections (to the same host, which if usually bad idea) and connection to some external web server.

You can use strace -p to see what host is doing, it often gives a clue. In this case I for example found the process is stuck in pool() system call reading from network. Using netstat can give you an idea what it can be but if you do not like guessing you can use gdb -p . It will not print you exact line of code in PHP which is running but can give you some good ideas – for example in this case I could find stack trace originated from php stream functions not from libmysql or memcache.so, which means it is not MySQL or memcache connections leaving last candidate as the only choice. I also could see some of the variables in GDB “bt” command output which also hinted what could be the problem.

By the way does anyone know any debugger which can connect to PHP process or apache with mod_php and provide backtrace in PHP terms not the one for zend engine ? That would be pretty cool.

Yet another great tool which you can use is server-status if you’re running apache. This way you will see the URL which that process is processing and so get few more hints on what may be happening or even get repeatable example in some cases.

The tools I mentioned regarding figuring our what is happening with the process are not only helpful to debug sleeping connections with MySQL but many other cases when you see web application locking up or starting to runs in the tight loop consuming too much CPU time.

If you know any other tools which could be helpful in this regard would appreciate your comments. There might be some smarter tools out where for production tracing.