Monitoring a Galera Cluster WSREP Status

Monitoring a Galera Cluster WSREP Status

Having a Galera database cluster is awesome, and monitoring each database server with Nagios using check_mysql is an easy thing to do. Unfortunately though, that setup won't tell you if there's a problem preventing one of the servers from functioning correctly.

It's easy to monitor the WSREP_READY variable status using the Nagios check_mysql_query plugin and NRPE on the database servers. First, add the following to your /etc/nagios/nrpe.cfg file on each database server:

command[check_galera]=/usr/lib/nagios/plugins/check_mysql_query -H localhost -w 0 -c 0 -u $arg1 -p $arg2 -q "SELECT STRCMP(VARIABLE_VALUE, \"ON\") FROM INFORMATION_SCHEMA.GLOBAL_STATUS WHERE VARIABLE_NAME ='wsrep_ready';"

What this config line does is create a new command for the NRPE daemon that queries the local database server for the "wsrep_ready" status. Since Nagios requires that the query return a numeric value, we compare that to the expected value of "ON" and tell it to throw a critical notification if it's anything else (the -w 0 -c 0 does that.)

On your Nagios server simply create a service that looks like the following:

define service {
        hostgroup_name                  galera
        service_description             Check galera cluster
        check_command                   check_nrpe!check_galera!dbuser dbpass
        use                             generic-service
        notification_interval           0
}

Now, Nagios can throw notifications if one of your servers has a failure that impacts it's ability to fulfill queries.

Posted by Tony on Feb 08, 2016 | Servers, Linux Tricks