Debugging sleeping connections with MySQL

Have you ever seen connection in the SHOW PROCESSLIST output which is in “Sleep” state for a long time and you have no idea why this would happen ?

I see if frequently with web applications and it is often indication of trouble. Not only it means you may run out of MySQL connections quicker than you expected but it also frequently indicates serious problems in the application. If you do not use persistent connections and you have connection in Sleep stage for 600 seconds what could it be ? It may mean some of your pages take that long to generate (or might be the code simply gets into the tight loop and page never gets generated) it also could mean some of external Web Services are slow or not available and you’re not dealing with timeouts properly. Or may be you have several connections to MySQL server and right now running query which takes that long ? In any case it is something frequently worth looking at.

First task is to find to which process the connection belongs. Using different user names for different application is a good practice however it will not tell you which of apache children is handling request in question. If you just want to fix it, ie by restarting apache it is enough but if you want to figure our why it is happening you need more info.

You my notice in the “Host” filed of SHOW PROCESSLIST output not only host but also port is specified, showing you something like “192.168.1.70:58555″ This port can be used to identify the process which owns connection in question:

[root@w1 ~]# netstat -ntp | grep :45384
tcp        0      0 192.168.1.70:45384          192.168.1.82:3306           ESTABLISHED 28540/php-cgi

As you can see in this case we can find php-cgi is holding connection in question (this is lighttpd based system with fastcgi)

Now you know the process and you can use your favorite tools to check what that process is doing.

[root@w1 ~]# netstat -ntp | grep 28540
tcp        0      0 192.168.1.70:58555          192.168.1.90:11211          ESTABLISHED 28540/php-cgi
tcp        0      0 192.168.1.70:52711          192.168.1.88:8080           ESTABLISHED 28540/php-cgi
tcp        0      0 192.168.1.70:45384          192.168.1.82:3306           ESTABLISHED 28540/php-cgi
tcp        0      0 192.168.1.70:45399          192.168.1.82:3306           ESTABLISHED 28540/php-cgi
tcp        0      0 192.168.1.70:45407          192.168.1.82:3306           ESTABLISHED 28540/php-cgi
tcp        0      0 192.168.1.70:45408          192.168.1.82:3306           ESTABLISHED 28540/php-cgi
tcp        0      0 192.168.1.70:35556          192.168.1.92:11211          ESTABLISHED 28540/php-cgi

Using same netstat command and filtering on the PID we can find which connections does this process have. Here you can see it has couple of memcached connections. Few MySQL connections (to the same host, which if usually bad idea) and connection to some external web server.

You can use strace -p to see what host is doing, it often gives a clue. In this case I for example found the process is stuck in pool() system call reading from network. Using netstat can give you an idea what it can be but if you do not like guessing you can use gdb -p . It will not print you exact line of code in PHP which is running but can give you some good ideas – for example in this case I could find stack trace originated from php stream functions not from libmysql or memcache.so, which means it is not MySQL or memcache connections leaving last candidate as the only choice. I also could see some of the variables in GDB “bt” command output which also hinted what could be the problem.

By the way does anyone know any debugger which can connect to PHP process or apache with mod_php and provide backtrace in PHP terms not the one for zend engine ? That would be pretty cool.

Yet another great tool which you can use is server-status if you’re running apache. This way you will see the URL which that process is processing and so get few more hints on what may be happening or even get repeatable example in some cases.

The tools I mentioned regarding figuring our what is happening with the process are not only helpful to debug sleeping connections with MySQL but many other cases when you see web application locking up or starting to runs in the tight loop consuming too much CPU time.

If you know any other tools which could be helpful in this regard would appreciate your comments. There might be some smarter tools out where for production tracing.

ClamAV on CentOS

ClamAV is a free anti-virus program available for Linux operating systems.This will explain how to Install ClamAV on CentOS 6 64.

Install the epel repository

First, determine the most current version of the repository that is available. Using a web browser, visit http://download.fedoraproject.org/pub/epel/6/x86_64/

Note you can substitute the CentOS version ( /6/ ) with your current version.

Scroll down the page until you find epel-release-v-r.noarch-rpm, substituting v for your CentOS version and r will be the current repository version. For this example, the current version listed is epel-release-6-8.noarch-rpm .

Log into your server as root and run the following command using the correct repository version you discovered in the previous step

CentOS 6.x

rpm -Uvh http://download.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm

CentOS 5.x

rpm -Uvh http://mirror.pnl.gov/epel/5/x86_64/epel-release-5-4.noarch.rpm

Enable Epel Repo – Set enabled=1:

nano /etc/yum.d/epel.repo
[epel]
name=Extra Packages for Enterprise Linux 6 - $basearch
#baseurl=http://download.fedoraproject.org/pub/epel/6/$basearch
mirrorlist=https://mirrors.fedoraproject.org/metalink?repo=epel-6&arch=$basearch
failovermethod=priority
enabled=0
gpgcheck=0
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-EPEL-6

Install clamav


# yum -y install clamav clamd

Set clamav to start on reboot

# chkconfig clamd on

Update the clamav virus database

# /usr/bin/freshclam

Error:

# /usr/bin/freshclam
ERROR: Please edit the example config file /etc/freshclam.conf
ERROR: Can't open/parse the config file /etc/freshclam.conf

Comment out the Line with “Example”


# nano /etc/freshclam.conf
##
## Example config file for freshclam
## Please read the freshclam.conf(5) manual before editing this file.
##
# Comment or remove the line below.
Example

Change to


# nano /etc/freshclam.conf
##
## Example config file for freshclam
## Please read the freshclam.conf(5) manual before editing this file.
##
# Comment or remove the line below.
# Example

Run freshclam again

# /usr/bin/freshclam

Start Clamav

# service clamd start

# service clamd start
Starting Clam AntiVirus Daemon: ERROR: Please edit the example config file /etc/clamd.conf
ERROR: Can't open/parse the config file /etc/clamd.conf
 [FAILED]

Edit the config file, comment out “Example”

##
## Example config file for the Clam AV daemon
## Please read the clamd.conf(5) manual before editing this file.
##
# Comment or remove the line below.
#Example

Set Clamav to run a daily scan

# nano /etc/cron.daily/clamscan
#!/bin/bash
# setup the scan location and scan log
CLAM_SCAN_DIR="/var/www/vhosts"
CLAM_LOG_FILE="/var/log/clamav/dailyscan.log"
# update the virus database
/usr/bin/freshclam
# run the scan
/usr/bin/clamscan -i -r $CLAM_SCAN_DIR >> $CLAM_LOG_FILE
MAILTO=user@domain.com

or

clamscan -i -r --log=/var/log/clamscan-date.txt /var/www/vhosts/*

Set the cron file as an executible

chmod 555 /etc/cron.daily/clamscan

Test your installation and cron job

/etc/cron.daily/clamscan

Dark Leach Virus

This root level compromise seems to affect CentOS 5.x and < Plesk 10.4. This affects Apache directly and requires a reinstallation. Slaving the original drive to migrate the files is acceptible since it affects the OS files themselves, but clamscan is still highly recommended. To determine if a server has this compromise: Plesk

fgrep -l "_INJECT_DO" /usr/lib*/httpd/modules/*.so

If there are any files in the output of this command the server is definitely root compromised and needs to be reinstalled immediately. The managed servers should detect this automatically but there is no harm in checking on any server you are investigating issues on.

WHM / cPanel

http://blog.sucuri.net/2013/04/apache-binary-backdoors-on-cpanel-based-servers.html

Since cPanel installs Apache inside /usr/local/apache and does not utilize the package managers, there is no single and simple command to detect if the Apache binary was modified.

grep -r open_tty /usr/local/apache/

If it finds open_tty in your Apache binary, it is likely compromised, since the original Apache binary does not contain a call to open_tty. Another interesting point is that if you try to just replace the bad binary with a good one, you will be denied, because they set the file attribute to immutable. So you have to run chattr -ai before replacing it:

chattr -ai /usr/local/apache/bin/httpd

Troubleshoot Qmail Spam

Is the server sending spam. Try this.
http://kb.parallels.com/766

First, check that all domains have the option ‘Mail to non-existing user’ set to ‘reject’ but not to ‘forward.’ You can change this setting to all domains using “Group Operations” in the “Domains” tab in Parallels Plesk Control Panel. The option “Reject mail to nonexistent user” is available since Parallels Plesk Panel 7.5.3.
Also check that all the IPs and networks in the white lists are reliable and familiar to you.

Check how many messages are in the queue with Qmail:

# /var/qmail/bin/qmail-qstat

messages in queue: 27645
messages in queue but not yet preprocessed: 82

If the queue has too many messages, try to discover the source of SPAM.

If mail is being sent by an authorized user but not from the PHP script, you can run the command below to find the user that has sent the most messages (available since Plesk 8.x). Note that you must have the ‘SMTP authorization’ activated on the server to see these records:

# cat /usr/local/psa/var/log/maillog |grep -I smtp_auth |grep -I user |awk '{print $11}' |sort |uniq -c |sort -n

The path to ‘maillog’ may differ depending on the OS you are using.

The next step is to use “qmail-qread,” which can be used to read the message headers:

# /var/qmail/bin/qmail-qread

18 Jul 2005 15:03:07 GMT #2996948 9073 <user@domain.com> bouncing
done remote user1@domain1.com
done remote user2@domain2.com
done remote user3@domain3.com
….

This shows the senders and recipients of messages. If the message contains too many recipients, probably this is spam. Now try to find this message in the queue by its ID ( # 2996948 in our example):

# find /var/qmail/queue/mess/ -name 2996948

Examine the message and find the line “Received” to find out from where it was sent for the first time. For example, if you find:

Received: (qmail 19514 invoked by uid 10003); 13 Sep 2005 17:48:22 +0700

it means that this message was sent via a CGI by user with UID 10003. Using this UID, it is possible to find the domain:

# grep 10003 /etc/passwd

If the ‘Received’ line contains a UID of a user ‘apache’ (for example invoked by uid 48), it means that spam was sent through a PHP script. In this case, you can try to find the spammer using information from spam email (address from/to or any other information). It is usually very difficult to discover the source of spam. If you are absolutely sure that this time there is a script which sends spam (tail grows rapidly for no apparent reason), you can use the following script to determine what PHP scripts are running at this time:

# lsof +r 1 -p `ps axww | grep httpd | grep -v grep | awk ' { if(!str) { str=$1 } else { str=str","$1}}END{print str}'` | grep vhosts | grep php

You can also apply the KB article which describes the procedure of discovering which domains are sending mail through PHP scripts.

http://kb.sp.parallels.com/en/766

Lines in Received section like

Received: (qmail 19622 invoked from network); 13 Sep 2005 17:52:36 +0700
Received: from external_domain.com (192.168.0.1)

means that the message has been accepted and delivered via SMTP, and that the sender is an authorized mail user.

Check the emails going out and look for a sending user that is in plesk:

cat /usr/local/psa/var/log/maillog | grep ‘validuser@user@domain.com’

Output:

Nov 7 10:01:07 mail smtp_auth: SMTP user @user@domain.com : logged in from (null) [188.xx.xx.xx]

The multiple IP logins show that the spam is from a valid user.

Check email passwords:

mysql -uadmin -p`cat /etc/psa/.psa.shadow ` psa -e ‘select m.mail_name,a.password,d.name from mail m,accounts a,domains d where m.account_id=a.id and m.dom_id=d.id;’

Or


# /usr/local/psa/admin/bin/mail_auth_view

Delete qmail email queue


# /usr/local/psa/admin/sbin/mailqueuemng -D

Qmail Wrapper

Use this method to track down any PHP scripts that might be sending email.
http://kb.parallels.com/en/1711

1) Create a /var/qmail/bin/sendmail-wrapper script with the following content:

#!/bin/sh
(echo X-Additional-Header: $PWD ;cat) | tee -a /var/tmp/mail.send|/var/qmail/bin/sendmail-qmail "$@"

Note, it should be two lines including ‘#!/bin/sh’.

2) Create a log file /var/tmp/mail.send and grant it “a+rw” rights; make the wrapper executable; rename old sendmail; and link it to the new wrapper:

touch /var/tmp/mail.send
chmod a+rw /var/tmp/mail.send
chmod a+x /var/qmail/bin/sendmail-wrapper
mv /var/qmail/bin/sendmail /var/qmail/bin/sendmail-qmail
ln -s /var/qmail/bin/sendmail-wrapper /var/qmail/bin/sendmail

3) Wait for an hour and change back sendmail:

rm -f /var/qmail/bin/sendmail
mv /var/qmail/bin/sendmail-qmail /var/qmail/bin/sendmail

Examine the /var/tmp/mail.send file. There should be lines starting with “X-Additional-Header:” pointing to domain folders where the scripts which sent the mail are located.
You can see all the folders from where mail PHP scripts were run with the following command:

grep X-Additional /var/tmp/mail.send | grep `cat /etc/psa/psa.conf | grep HTTPD_VHOSTS_D | sed -e 's/HTTPD_VHOSTS_D//' `

If you see no output from the above command, it means that no mail was sent using the PHP mail() function from the Plesk virtual hosts directory.