Monitoring a Galera Cluster WSREP Status
Having a Galera database cluster is awesome, and monitoring each database server with Nagios using check_mysql is an easy thing to do. Unfortunately though, that setup won't tell you if there's a problem preventing one of the servers from functioning correctly.
It's easy to monitor the WSREP_READY variable status using the Nagios check_mysql_query plugin and NRPE on the database servers. First, add the following to your /etc/nagios/nrpe.cfg file on each database server:
command[check_galera]=/usr/lib/nagios/plugins/check_mysql_query -H localhost -w 0 -c 0 -u $arg1 -p $arg2 -q "SELECT STRCMP(VARIABLE_VALUE, \"ON\") FROM INFORMATION_SCHEMA.GLOBAL_STATUS WHERE VARIABLE_NAME ='wsrep_ready';"
What this config line does is create a new command for the NRPE daemon that queries the local database server for the "wsrep_ready" status. Since Nagios requires that the query return a numeric value, we compare that to the expected value of "ON" and tell it to throw a critical notification if it's anything else (the -w 0 -c 0 does that.)
On your Nagios server simply create a service that looks like the following:
define service { hostgroup_name galera service_description Check galera cluster check_command check_nrpe!check_galera!dbuser dbpass use generic-service notification_interval 0 }
Now, Nagios can throw notifications if one of your servers has a failure that impacts it's ability to fulfill queries.