Nagios + Cacti 其實在易用性上是比不上zabbix的,但是對于僅僅需要報警而無需圖表的服務監控,nagios 的確比較好,之前由于IDC遷移,就把之前老的那套nagios+cacti 環境重新部署了一次。
Nagios:
- 準備工作:
apt-get install autoconf gcc libc6 build-essential bc gawk dc gettext \
libmcrypt-dev libssl-dev make unzip apache2 apache2-utils php5 libgd2-xpm-dev
/usr/sbin/useradd -m -s /bin/bash nagios #創建用戶
/usr/sbin/groupadd nagcmd #創建ganioscmd 用戶,用于執行一些外部命令,比如nrpe
/usr/sbin/usermod -a -G nagcmd nagios
/usr/sbin/usermod -a -G nagcmd www-data
- 安裝:
tar zxvf nagios-4.3.1.tar.gz
cd nagios-4.3.1.tar.gz
./configure --prefix=/opt/nagios --with-command-group=nagcmd --with-httpd-conf=/etc/apache2/sites-enabled
make all
make install
make install-init
make install-config
make install-commandmode
update-rc.d nagios defaults #初始化各種配置以及增加開啟啟動
- nagios目錄:
root@10.1.1.208:nagios# ls
bin etc libexec log sbin share var
其中nagios主要配置文件在etc 下,而插件主要則放在libexec下。
- 配置nagios:
公司的nagios 主要用來監控一些服務器的硬件狀態,比如磁盤是否完好等等,而且均通過nrpe的方式進行監控,用于減少本地服務器負擔。nagios的配置為分布式的,可以根據需要將多個配置注冊在總的nagios.cfg 配置里。
# You can specify individual object config files as shown below:
cfg_file=/opt/nagios/etc/objects/commands.cfg
cfg_file=/opt/nagios/etc/objects/contacts.cfg
cfg_file=/opt/nagios/etc/objects/timeperiods.cfg
cfg_file=/opt/nagios/etc/objects/templates.cfg
#
cfg_file=/opt/nagios/etc/objects/service.cfg
cfg_file=/opt/nagios/etc/objects/group.cfg
# Definitions for monitoring the local (Linux) host
#cfg_file=/opt/nagios/etc/objects/localhost.cfg
cfg_file=/opt/nagios/etc/objects/host_debian.cfg
cfg_file=/opt/nagios/etc/objects/host_centos.cfg
然后對應編輯目錄就行了,假設我要添加一臺linux 服務器,用于監控硬盤信息,需要如下步驟:
1 .修改commands.cfg 配置,增加對應command:
# check hardware Disk
define command{
command_name check_storage_disk_nrpe
command_line /opt/nagios/libexec/check_storage_disk_nrpe $HOSTADDRESS$ check_storage_disk
}
libexec下放對應的腳本,大致意思就是nagios遠程機器執行check_storage_disk 模塊,而check_storage_disk 就是遠程機器的一個監控腳本。
#!/bin/bash
PLUGINS=/opt/nagios/libexec
CHECK_NRPE=$PLUGINS/check_nrpe
host=$1
comm=$2
if [ $# -lt 2 ];then
echo "Usage: $0 host command"
exit 2
fi
#command_line $USER1$/check_snmp_traffic $HOSTADDRESS$ public 3 " > 80 " " > 90 "
res=`$CHECK_NRPE -H$host -n -p57000 -c $comm`
if [ $? -ne 0 ];then
if [ "CHECK_NRPE: Socket timeout after 10 seconds." == ${res} ];then
echo "connect failed"
exit 0
else
echo "Check Storage UNKNOWN"
exit 3
fi
fi
if [ "${res}" == "Storage Disk Normal" ];then
echo "Check Storage OK"
exit 0
else
echo "${res}"
exit 2
fi
echo $res
exit $EXIT
nrpe 插件可以在nagios.org里下載。
然后將該服務注冊到service.cfg 中:
define service{
use local-service
hostgroup_name debian_servers
service_description hardware_disk_check
check_command check_storage_disk_nrpe
}
然后創建host 配置以及host group 配置:
define hostgroup{
hostgroup_name debian_servers
alias servers
members test
}
define host{
use linux-server
host_name test
alias 01
address 192.168.1.1
}
nagios 登錄是通過apache htpass 做驗證的,比較簡單,修改對應的cgi的密碼就行。修改nagios登錄用戶需要修改apache的htpasswd之外,還需要修改cgi.cfg 里的用戶認證。
然后檢查nagios 配置:
/opt/nagios/bin/nagios -v /opt/nagios/etc/nagios.cfg
然后啟動nagios
nagios 編譯安裝默認沒有在init下有啟動服務的腳本:
這里貼一個:
#!/bin/sh
#
# chkconfig: 345 99 01
# description: Nagios network monitor
#
# File : nagios
#
# Author : Jorge Sanchez Aymar (jsanchez@lanchile.cl)
#
# Changelog :
#
# 1999-07-09 Karl DeBisschop <kdebisschop@infoplease.com>
# - setup for autoconf
# - add reload function
# 1999-08-06 Ethan Galstad <egalstad@nagios.org>
# - Added configuration info for use with RedHat's chkconfig tool
# per Fran Boon's suggestion
# 1999-08-13 Jim Popovitch <jimpop@rocketship.com>
# - added variable for nagios/var directory
# - cd into nagios/var directory before creating tmp files on startup
# 1999-08-16 Ethan Galstad <egalstad@nagios.org>
# - Added test for rc.d directory as suggested by Karl DeBisschop
# 2000-07-23 Karl DeBisschop <kdebisschop@users.sourceforge.net>
# - Clean out redhat macros and other dependencies
# 2003-01-11 Ethan Galstad <egalstad@nagios.org>
# - Updated su syntax (Gary Miller)
#
# Description: Starts and stops the Nagios monitor
# used to provide network services status.
#
status_nagios ()
{
if test -x $NagiosCGI/daemonchk.cgi; then
if $NagiosCGI/daemonchk.cgi -l $NagiosRunFile; then
return 0
else
return 1
fi
else
if ps -p $NagiosPID > /dev/null 2>&1; then
return 0
else
return 1
fi
fi
return 1
}
printstatus_nagios()
{
if status_nagios $1 $2; then
echo "nagios (pid $NagiosPID) is running..."
else
echo "nagios is not running"
fi
}
killproc_nagios ()
{
kill $2 $NagiosPID
}
pid_nagios ()
{
if test ! -f $NagiosRunFile; then
echo "No lock file found in $NagiosRunFile"
exit 1
fi
NagiosPID=`head -n 1 $NagiosRunFile`
}
# Source function library
# Solaris doesn't have an rc.d directory, so do a test first
if [ -f /etc/rc.d/init.d/functions ]; then
. /etc/rc.d/init.d/functions
elif [ -f /etc/init.d/functions ]; then
. /etc/init.d/functions
fi
prefix=/opt/nagios
exec_prefix=${prefix}
NagiosBin=${exec_prefix}/bin/nagios
NagiosCfgFile=${prefix}/etc/nagios.cfg
NagiosStatusFile=${prefix}/var/status.dat
NagiosRetentionFile=${prefix}/var/retention.dat
NagiosCommandFile=${prefix}/var/rw/nagios.cmd
NagiosVarDir=${prefix}/var
NagiosRunFile=${prefix}/var/nagios.lock
NagiosLockDir=/var/lock/subsys
NagiosLockFile=nagios
NagiosCGIDir=${exec_prefix}/sbin
NagiosUser=nagios
NagiosGroup=nagios
# Check that nagios exists.
if [ ! -f $NagiosBin ]; then
echo "Executable file $NagiosBin not found. Exiting."
exit 1
fi
# Check that nagios.cfg exists.
if [ ! -f $NagiosCfgFile ]; then
echo "Configuration file $NagiosCfgFile not found. Exiting."
exit 1
fi
# See how we were called.
case "$1" in
start)
echo -n "Starting nagios:"
$NagiosBin -v $NagiosCfgFile > /dev/null 2>&1;
if [ $? -eq 0 ]; then
su - $NagiosUser -c "touch $NagiosVarDir/nagios.log $NagiosRetentionFile"
rm -f $NagiosCommandFile
touch $NagiosRunFile
chown $NagiosUser:$NagiosGroup $NagiosRunFile
$NagiosBin -d $NagiosCfgFile
if [ -d $NagiosLockDir ]; then touch $NagiosLockDir/$NagiosLockFile; fi
echo " done."
exit 0
else
echo "CONFIG ERROR! Start aborted. Check your Nagios configuration."
exit 1
fi
;;
stop)
echo -n "Stopping nagios: "
pid_nagios
killproc_nagios nagios
# now we have to wait for nagios to exit and remove its
# own NagiosRunFile, otherwise a following "start" could
# happen, and then the exiting nagios will remove the
# new NagiosRunFile, allowing multiple nagios daemons
# to (sooner or later) run - John Sellens
#echo -n 'Waiting for nagios to exit .'
for i in 1 2 3 4 5 6 7 8 9 10 ; do
if status_nagios > /dev/null; then
echo -n '.'
sleep 1
else
break
fi
done
if status_nagios > /dev/null; then
echo ''
echo 'Warning - nagios did not exit in a timely manner'
else
echo 'done.'
fi
rm -f $NagiosStatusFile $NagiosRunFile $NagiosLockDir/$NagiosLockFile $NagiosCommandFile
;;
status)
pid_nagios
printstatus_nagios nagios
;;
checkconfig)
printf "Running configuration check..."
$NagiosBin -v $NagiosCfgFile > /dev/null 2>&1;
if [ $? -eq 0 ]; then
echo " OK."
else
echo " CONFIG ERROR! Check your Nagios configuration."
exit 1
fi
;;
restart)
printf "Running configuration check..."
$NagiosBin -v $NagiosCfgFile > /dev/null 2>&1;
if [ $? -eq 0 ]; then
echo "done."
$0 stop
$0 start
else
echo " CONFIG ERROR! Restart aborted. Check your Nagios configuration."
exit 1
fi
;;
reload|force-reload)
printf "Running configuration check..."
$NagiosBin -v $NagiosCfgFile > /dev/null 2>&1;
if [ $? -eq 0 ]; then
echo "done."
if test ! -f $NagiosRunFile; then
$0 start
else
pid_nagios
if status_nagios > /dev/null; then
printf "Reloading nagios configuration..."
killproc_nagios nagios -HUP
echo "done"
else
$0 stop
$0 start
fi
fi
else
echo " CONFIG ERROR! Reload aborted. Check your Nagios configuration."
exit 1
fi
;;
*)
echo "Usage: nagios {start|stop|restart|reload|force-reload|status|checkconfig}"
exit 1
;;
esac
# End of this script
然后登錄檢查即可。
cacti
cacti 用于監控出圖,其實nagios 可以通過pnp4nagios 進行出圖,就是體驗不是太好,cacti 用于定制化監控圖表還是很不錯的,雖然大家用的都是rrdtool。
- 準備
apt-get install rrdtool php5 mysql-server
其實php5不止要裝那么點包,這個之后再說。
下載cacti 后解壓進入目錄,登錄mysql 導入cacti 對應數據表:
mysql> create database cacti;
mysql>use cacti;
Query OK, 1 row affected (0.00 sec)
mysql> source cacti.sql;
mysql> GRANT ALL PRIVILEGES ON cacti.* TO 'cacti'@'127.0.0.1' IDENTIFIED BY 'cacti';
修改配置文件:
vi include/config.php
$database_type = 'mysql';
$database_default = 'cacti';
$database_hostname = '127.0.0.1';
$database_username = 'cacti';
$database_password = 'cacti';
$database_port = '3306';
$database_ssl = false;
之后登錄ip/cacti 后會出現安裝配置界面:
默認用戶admin 密碼admin
這里會提示缺少哪些包,裝上即可:
新版本的cacti 有個問題在于mysql 是時區權限。就是上圖那個報錯,需要修復一下:
mysql> GRANT SELECT ON mysql.time_zone_name TO cacti@'127.0.0.1';
mysql_tzinfo_to_sql /usr/share/zoneinfo/ | mysql -u root -p mysql
之后next 變安裝完成。
之后就配置snmp 進行監控和出圖啦。