Nagios Error 127- An Unusual Solution

I post this to benefit any Nagios admin who struggles determining why their check and notification plugins fail with an error code 127, like this one:

Aug 12 07:01:39 nagios_host nagios: Warning: Return code of 127 for check of service 'HTTP' on host 'examplehost' was out of bounds. Make sure the plugin you're trying to run actually exists.

That error message gives sound advice for the 99% case. However, a misconfigured commands.cfg file is not the only cause of an 127 error message. After Googling and searching though many support forums, comparing working configs to my non-working one, setting LD_LIBRARY_PATH in my Nagios init script, and even writing test plugins, I found a big clue when I decided to strace the Nagios daemon looking for execve/execvp calls:

[pid 23350] execve("/bin/sh", ["sh", "-c", "/bin/echo -e \"***** Nagios *****\\n\\nNotification Type: PROBLEM\\n\\nService: Puppet Client\\nHost: web-001 \\nAddress: 192.168.1.2\\nState: UNKNOWN\\nLast State: UNKNOWN\\n\\nDate/Time: Sun May 22 18:56:28 UTC 2011\\n\\nAdditional Info:\\n\\nNRPE: Unable to read output\" | /usr/bin/mail -s \"** PROBLEM Service Alert: web-001/Puppet Client is UNKNOWN **\" -a \"Reply-to: alerts@host.com\" alerts@host.com"], [/* 210 vars */]) = -1 E2BIG (Argument list too long)

Aha! It wasn't the command definition after all, it was something entirely different! After Googling for 'nagios E2BIG', I discovered that for large installations, the config option 'enable_environment_macros' needed to be disabled, otherwise this condition would occur. At any rate, Nagios should handle this particular error condition with a much more informative error message. Please let me know in the comments if this helps!

2011-05-22

Dialogue & Discussion