MySQL general query log performance effects
Instead of using the general log, how about going with a query profiler? In fact, you could do query profiling without using any of MySQL's log files and while the queries are still running.
You must use mk-query-digest or pt-query-digest and poll the processlist.
I learned how to use mk-query-digest from this youtube video as a replacement for the slow log: http://www.youtube.com/watch?v=GXwg1fiUF68&feature=colike
Here is the script I wrote to run the query digest program
#!/bin/sh
RUNFILE=/tmp/QueriesAreBeingDigested.txt
if [ -f ${RUNFILE} ] ; then exit ; fi
MKDQ=/usr/local/sbin/mk-query-digest
RUNTIME=${1}
COPIES_TO_KEEP=${2}
DBVIP=${3}
WHICH=/usr/bin/which
DATE=`${WHICH} date`
ECHO=`${WHICH} echo`
HEAD=`${WHICH} head`
TAIL=`${WHICH} tail`
AWK=`${WHICH} awk`
SED=`${WHICH} sed`
CAT=`${WHICH} cat`
WC=`${WHICH} wc`
RM=`${WHICH} rm | ${TAIL} -1 | ${AWK} '{print $1}'`
LS=`${WHICH} ls | ${TAIL} -1 | ${AWK} '{print $1}'`
HAS_THE_DBVIP=`/sbin/ip addr show | grep "scope global secondary" | grep -c "${DBVIP}"`
if [ ${HAS_THE_DBVIP} -eq 1 ] ; then exit ; fi
DT=`${DATE} +"%Y%m%d_%H%M%S"`
UNIQUETAG=`${ECHO} ${SSH_CLIENT}_${SSH_CONNECTION}_${DT} | ${SED} 's/\./ /g' | ${SED} 's/ //g'`
cd /root/QueryDigest
OUTFILE=QP_${DT}.txt
HOSTADDR=${DBVIP}
${MKDQ} --processlist h=${HOSTADDR},u=queryprofiler,p=queryprofiler --run-time=${RUNTIME} > ${OUTFILE}
#
# Rotate out Old Copies
#
QPFILES=QPFiles.txt
QPFILES2ZAP=QPFiles2Zap.txt
${LS} QP_[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]_[0-9][0-9][0-9][0-9][0-9][0-9].txt > ${QPFILES}
LINECOUNT=`${WC} -l < ${QPFILES}`
if [ ${LINECOUNT} -gt ${COPIES_TO_KEEP} ]
then
(( DIFF = LINECOUNT - COPIES_TO_KEEP ))
${HEAD} -${DIFF} < ${QPFILES} > ${QPFILES2ZAP}
for QPFILETOZAP in `${CAT} ${QPFILES2ZAP}`
do
${RM} ${QPFILETOZAP}
done
fi
rm -f ${QPFILES2ZAP}
rm -f ${QPFILES}
rm -f ${RUNFILE}
Make sure
- you a user called queryprofiler whose password is queryprofiler and who only has the PROCESS privilege
- you put
*/20 * * * * /root/QueryDigest/ExecQueryDigest.sh 1190s 144 10.64.95.141
in the crontab to run every 20 minutes (Each profile is 20 min less 10 seconds, Keeps the last 144 copies, and only runs if specfifc DBVIP is present [Alter script to bypass checking for DBVIPs])
The output produces a file with the 20 worst running queries based on the number of times the query was called X avg sec per query.
Here is the sample output of the query profiling summary of mk-query-digest
# Rank Query ID Response time Calls R/Call Item
# ==== ================== ================ ======= ========== ====
# 1 0x812D15015AD29D33 336.3867 68.5% 910 0.369656 SELECT mt_entry mt_placement mt_category
# 2 0x99E13015BFF1E75E 25.3594 5.2% 210 0.120759 SELECT mt_entry mt_objecttag
# 3 0x5E994008E9543B29 16.1608 3.3% 46 0.351321 SELECT schedule_occurrence schedule_eventschedule schedule_event schedule_eventtype schedule_event schedule_eventtype schedule_occurrence.start
# 4 0x84DD09F0FC444677 13.3070 2.7% 23 0.578567 SELECT mt_entry
# 5 0x377E0D0898266FDD 12.0870 2.5% 116 0.104199 SELECT polls_pollquestion mt_category
# 6 0x440EBDBCEDB88725 11.5159 2.3% 21 0.548376 SELECT mt_entry
# 7 0x1DC2DFD6B658021F 10.3653 2.1% 54 0.191949 SELECT mt_entry mt_placement mt_category
# 8 0x6C6318E56E149036 8.8294 1.8% 44 0.200667 SELECT schedule_occurrence schedule_eventschedule schedule_event schedule_eventtype schedule_event schedule_eventtype schedule_occurrence.start
# 9 0x392F6DA628C7FEBD 8.5243 1.7% 9 0.947143 SELECT mt_entry mt_objecttag
# 10 0x7DD2B294CFF96961 7.3753 1.5% 70 0.105362 SELECT polls_pollresponse
# 11 0x9B9092194D3910E6 5.8124 1.2% 57 0.101973 SELECT content_specialitem content_basecontentitem advertising_product organizations_neworg content_basecontentitem_item_attributes
# 12 0xA909BF76E7051792 5.6005 1.1% 55 0.101828 SELECT mt_entry mt_objecttag mt_tag
# 13 0xEBE07AC48DB8923E 5.5195 1.1% 54 0.102213 SELECT rssfeeds_contentfeeditem
# 14 0x3E52CF0261A7C3FF 4.4676 0.9% 44 0.101536 SELECT schedule_occurrence schedule_occurrence.start
# 15 0x9D0BCD3F6731195B 4.2804 0.9% 41 0.104401 SELECT mt_entry mt_placement mt_category
# 16 0x7961BD4C76277EB7 4.0143 0.8% 18 0.223014 INSERT UNION UPDATE UNION mt_session
# 17 0xD2F486BA41E7A623 3.1448 0.6% 21 0.149754 SELECT mt_entry mt_placement mt_category mt_objecttag mt_tag
# 18 0x3B9686D98BB8E054 2.9577 0.6% 11 0.268885 SELECT mt_entry mt_objecttag mt_tag
# 19 0xBB2443BF48638319 2.7239 0.6% 9 0.302660 SELECT rssfeeds_contentfeeditem
# 20 0x3D533D57D8B466CC 2.4209 0.5% 15 0.161391 SELECT mt_entry mt_placement mt_category
Above this output are histograms of these 20 top worst-performing queries
Example of the first entry's histogram
# Query 1: 0.77 QPS, 0.28x concurrency, ID 0x812D15015AD29D33 at byte 0 __
# This item is included in the report because it matches --limit.
# pct total min max avg 95% stddev median
# Count 36 910
# Exec time 58 336s 101ms 2s 370ms 992ms 230ms 393ms
# Lock time 0 0 0 0 0 0 0 0
# Users 1 mt
# Hosts 905 10.64.95.74:54707 (2), 10.64.95.74:56133 (2), 10.64.95.80:33862 (2)... 901 more
# Databases 1 mt1
# Time range 1321642802 to 1321643988
# bytes 1 1.11M 1.22k 1.41k 1.25k 1.26k 25.66 1.20k
# id 36 9.87G 11.10M 11.11M 11.11M 10.76M 0.12 10.76M
# Query_time distribution
# 1us
# 10us
# 100us
# 1ms
# 10ms
# 100ms ################################################################
# 1s ###
# 10s+
# Tables
# SHOW TABLE STATUS FROM `mt1` LIKE 'mt_entry'\G
# SHOW CREATE TABLE `mt1`.`mt_entry`\G
# SHOW TABLE STATUS FROM `mt1` LIKE 'mt_placement'\G
# SHOW CREATE TABLE `mt1`.`mt_placement`\G
# SHOW TABLE STATUS FROM `mt1` LIKE 'mt_category'\G
# SHOW CREATE TABLE `mt1`.`mt_category`\G
# EXPLAIN
SELECT `mt_entry`.`entry_id`, `mt_entry`.`entry_allow_comments`, `mt_entry`.`entry_allow_pings`, `mt_entry`.`entry_atom_id`, `mt_entry`.`entry_author_id`, `mt_entry`.`entry_authored_on`, `mt_entry`.`entry_basename`, `mt_entry`.`entry_blog_id`, `mt_entry`.`entry_category_id`, `mt_entry`.`entry_class`, `mt_entry`.`entry_comment_count`, `mt_entry`.`entry_convert_breaks`, `mt_entry`.`entry_created_by`, `mt_entry`.`entry_created_on`, `mt_entry`.`entry_excerpt`, `mt_entry`.`entry_keywords`, `mt_entry`.`entry_modified_by`, `mt_entry`.`entry_modified_on`, `mt_entry`.`entry_ping_count`, `mt_entry`.`entry_pinged_urls`, `mt_entry`.`entry_status`, `mt_entry`.`entry_tangent_cache`, `mt_entry`.`entry_template_id`, `mt_entry`.`entry_text`, `mt_entry`.`entry_text_more`, `mt_entry`.`entry_title`, `mt_entry`.`entry_to_ping_urls`, `mt_entry`.`entry_week_number` FROM `mt_entry` INNER JOIN `mt_placement` ON (`mt_entry`.`entry_id` = `mt_placement`.`placement_entry_id`) INNER JOIN `mt_category` ON (`mt_placement`.`placement_category_id` = `mt_category`.`category_id`) WHERE (`mt_entry`.`entry_status` = 2 AND `mt_category`.`category_basename` IN ('business_review' /*... omitted 3 items ...*/ ) AND NOT (`mt_entry`.`entry_id` IN (53441))) ORDER BY `mt_entry`.`entry_authored_on` DESC LIMIT 4\G
There is no performance impact doing this because one DB Connection is maintained to poll the processlist for the duration you specify and hundreds of queries against the processlist are totally lightweight within the confines of a single DB Connection. In light of this, you wll not need a NFS share or any hardware considerations for query performance and analysis.
Give it a Try !!!
UPDATE
mk-query-digest can use either the processlist (via live DB Connection) or tcpdump (via pipe). Here are the options:
--processlist
--processlist
type: DSN
Poll this DSNâs processlist for queries, with "--interval" sleep between.
If the connection fails, mk-query-digest tries to reopen it once
per second. See also "--mirror".
--interval
type: float; default: .1
How frequently to poll the processlist, in seconds.
--tcpdump
tcpdump
mk-query-digest does not actually watch the network (i.e. it
does NOT "sniff packets"). Instead, itâs just parsing the out-
put of tcpdump. You are responsible for generating this out-
put; mk-query-digest does not do it for you. Then you send
this to mk-query-digest as you would any log file: as files on
the command line or to STDIN.
The parser expects the input to be formatted with the following
options: "-x -n -q -tttt". For example, if you want to capture
output from your local machine, you can do something like
tcpdump -i eth0 port 3306 -s 65535 -c 1000 -x -n -q -tttt \
â mk-query-digest --type tcpdump
The other tcpdump parameters, such as -s, -c, and -i, are up to
you. Just make sure the output looks like this:
2009-04-12 09:50:16.804849 IP 127.0.0.1.42167 > 127.0.0.1.3306: tcp 37
0x0000: 4508 0059 6eb2 4000 4006 cde2 7f00 0001
0x0010: ....
Remember tcpdump has a handy -c option to stop after it cap-
tures some number of packets! Thatâs very useful for testing
your tcpdump command.
After 8.5 years, I decided to update this post with a version of my wrapper script using pt-query-digest instead of mk-query-digest:
#!/bin/bash
#
# qdlive : Wrapper Around pt-query-digest
# BASE_FOLDER : Target Folder for This Wrapper's Output
# PT_QUERY_DIGEST : Absolute path to the pt-query-digest script
# MYCNF : Absolue path to my.cnf with User Credentials to run pt-query-digest and mysqladmin
# QD_FOLDER : Folder Where pt-query-digest Output are Stored
#
BASE_FOLDER=${HOME}/qdlive
MYCNF=${BASE_FOLDER}/my.cnf
ERRLOG=${BASE_FOLDER}/err.log
PT_QUERY_DIGEST=/usr/bin/pt-query-digest
QD_FOLDER=${BASE_FOLDER}/reports
mkdir -p ${QD_FOLDER}
cd ${BASE_FOLDER}
#
# 1st Parameter is the Number of Minutes for pt-query-digest to run
# 2nd Parameter is the Number of Copies of the pt-query-digest Output to Keep
#
[ "${1}" == "" ] && exit
[ "${2}" == "" ] && exit
[ ! "${3}" == "" ] && exit
OK1=`echo "${1}" | grep -c "^[1-9]$"`
OK2=`echo "${1}" | grep -c "^[1-9][0-9]$"`
OK3=`echo "${1}" | grep -c "^[1-9][0-9][0-9]$"`
(( OK = OK1 + OK2 + OK3 ))
if [ ${OK} -eq 0 ]
then
DT=`date +"%Y-%m-%d %H:%M:%S"`
echo "${DT} : Invalid Minutes (${1})" >> ${ERRLOG}
exit
fi
OK=1
[ ${1} -lt 2 ] && OK=0
[ ${1} -gt 60 ] && OK=0
if [ ${OK} -eq 0 ]
then
DT=`date +"%Y-%m-%d %H:%M:%S"`
echo "${DT} : Invalid Minutes (${1})" >> ${ERRLOG}
exit
fi
RUNTIME_MINUTES=${1}
(( RUNTIME_SECONDS = RUNTIME_MINUTES * 60 ))
OK1=`echo "${2}" | grep -c "^[1-9]$"`
OK2=`echo "${2}" | grep -c "^[1-9][0-9]$"`
OK3=`echo "${2}" | grep -c "^[1-9][0-9][0-9]$"`
(( OK = OK1 + OK2 + OK3 ))
if [ ${OK} -eq 0 ]
then
DT=`date +"%Y-%m-%d %H:%M:%S"`
echo "${DT} : Invalid Copies (${2})" >> ${ERRLOG}
exit
fi
OK=1
[ ${2} -lt 7 ] && OK=0
[ ${2} -gt 300 ] && OK=0
if [ ${OK} -eq 0 ]
then
DT=`date +"%Y-%m-%d %H:%M:%S"`
echo "${DT} : Invalid Copies (${2})" >> ${ERRLOG}
exit
fi
COPIES_TO_KEEP=${2}
#
# Parse my.cnf
#
# [mysql]
# host = ....
# user = ....
# password = ...
#
MYSQL_HOST=localhost
MYSQL_HOST=`grep ^host ${MYCNF} | awk '{print $3}'`
MYSQL_USER=`grep ^user ${MYCNF} | awk '{print $3}'`
MYSQL_PASS=`grep ^password ${MYCNF} | awk '{print $3}'`
MYSQL_PORT=3306
MYSQL_AUTH="--defaults-file=${MYCNF}"
PTOOL_AUTH="-F ${MYCNF}"
#
# Make Sure mysqld is running
#
MYSQL_RUNNING=`mysqladmin ${MYSQL_AUTH} ping 2>&1 | grep -c "mysqld is alive"`
if [ ${MYSQL_RUNNING} -eq 0 ]
then
DT=`date +"%Y-%m-%d %H:%M:%S"`
echo "${DT} : mysqld Not Running" >> ${ERRLOG}
exit
fi
#
# Execute Query Digest
#
DT=`date +"%Y%m%d_%a_%H%M%S"`
QD_OUTPUT=${QD_FOLDER}/qd_${DT}_${RUNTIME_MINUTES}m.rpt
${PT_QUERY_DIGEST} --processlist ${MYSQL_HOST} ${PTOOL_AUTH} --run-time=${RUNTIME_SECONDS}s >${QD_OUTPUT} 2>&1
#
# Rotate Old Reports
#
COPIES_ON_HAND=`ls -l | grep -c rpt$`
if [ ${COPIES_ON_HAND} -gt ${COPIES_TO_KEEP} ]
then
(( COPIES_TO_ZAP = COPIES_ON_HAND - COPIES_TO_KEEP ))
FILELIST=""
SPC=""
for FIL in `ls *.txt | head -${COPIES_TO_ZAP}`
do
FILELIST="${FILELIST}${SPC}${FIL}"
SPC=" "
done
for FIL in ${FILELIST} ; do rm -f ${FIL} ; done
fi
Like Rolando, I would recommend using pt-query-digest to capture this data, but instead of using the --processlist option, capture data with tcpdump. The processlist method will not capture everything and timing data will only be as accurate as the granularity of your interval. tcpdump will get every query. What you get out of the processlist that you don't get from tcpdump is additional information about what state the queries were in.
Here is how to do this. It could easily be scripted:
tcpdump -s 65535 -x -nn -q -tttt -i any -c 9999999 port 3306 | gzip -c > /tmp/tcpdump.txt.gz
gunzip -c /tmp/tcpdump.txt.gz | pt-query-digest --type tcpdump > /tmp/digest.txt
The only issue is that (to my knowledge) tcpdump cannot capture data over an interval of time - it only understands packets. So, either make the value of -c (the number of packets to capture) some very large number, or kill the tcpdump manually after 24 hours...again, that can be scripted. You can use pt-query-digest filters to select for the exact time range, if that is important.
I would output tcpdump to some destination that has a lot of space. In the example above, I am gzipping, but that isn't necessary if you have sufficient space. pt-query-digest is very CPU intensive, so I wouldn't run the pt-query-digest command right on the production server - copy the data somewhere else and work with it there.
EDIT: I forgot to mention that if you don't want the summary data, you can extract just the list of queries from here by adding --no-report --print