Zabbix-server在監(jiān)控的過程中,發(fā)現(xiàn)Web上報(bào)錯(cuò)提示zabbix server與agent之間已經(jīng)失聯(lián)超過5分鐘了,為了找到問題出現(xiàn)的根本原因,在troubleshouting的時(shí)候,應(yīng)該首先去查看服務(wù)對(duì)應(yīng)的日志。首先應(yīng)該查看server端的日志有無錯(cuò)誤消息,通過日志查看,發(fā)現(xiàn)server端運(yùn)行正常,那么問題很可能就出現(xiàn)在了客戶端上,來到這臺(tái)agent上查看服務(wù)日志。
1.查看日志
[root@iZbp11rfoyeescusr9ha9qZ tmp]# find / -name *agentd.log
/var/log/zabbix/zabbix_agentd.log
[root@iZbp11rfoyeescusr9ha9qZ tmp]# vim /var/log/zabbix/zabbix_agentd.log
23904:20170310:092458.633 Starting Zabbix Agent [Zabbix server]. Zabbix 2.2.16 (revision 64243).
23904:20170310:092458.634 using configuration file: /etc/zabbix_agentd.conf
23915:20170310:092458.636 agent #1 started [listener #1]
23918:20170310:092458.636 agent #3 started [listener #3]
23917:20170310:092458.636 agent #2 started [listener #2]
23914:20170310:092458.636 agent #0 started [collector]
23919:20170310:092458.637 agent #4 started [active checks #1]
23919:20170310:092458.637 active check configuration update from [127.0.0.1:10051] started to fail (cannot connect to [[127.0.0.1]:10051]: [111] Connection refused)
23919:20170310:102358.983 active check configuration update from [127.0.0.1:10051] is working again
23919:20170310:102358.983 no active checks on server [127.0.0.1:10051]: host [Zabbix server] not monitored
23919:20170310:102559.020 no active checks on server [127.0.0.1:10051]: host [Zabbix server] not monitored
23919:20170310:102759.073 no active checks on server [127.0.0.1:10051]: host [Zabbix server] not monitored
23919:20170310:102959.109 no active checks on server [127.0.0.1:10051]: host [Zabbix server] not monitored
23904:20170310:103011.545 Got signal [signal:15(SIGTERM),sender_pid:26144,sender_uid:0,reason:0]. Exiting ...
23904:20170310:103011.547 Zabbix Agent stopped. Zabbix 2.2.16 (revision 64243).
26157:20170310:103011.659 Starting Zabbix Agent [Zabbix server]. Zabbix 2.2.16 (revision 64243).
26157:20170310:103011.659 using configuration file: /etc/zabbix_agentd.conf
26168:20170310:103011.663 agent #1 started [listener #1]
26172:20170310:103011.663 agent #4 started [active checks #1]
26171:20170310:103011.663 agent #3 started [listener #3]
26170:20170310:103011.663 agent #2 started [listener #2]
26166:20170310:103011.664 agent #0 started [collector]
26172:20170310:103011.667 no active checks on server [127.0.0.1:10051]: host [Zabbix server] not monitored
通過查看日志,發(fā)現(xiàn)23919:20170310:092458.637 這條日志記錄告訴我們說,主動(dòng)從[127.0.0.1:10051]檢查配置更新失敗,agent與server之間的連接失敗。
2.修改agent的配置文件,將ServerActive的地址改為zabbix-server的IP地址
[root@iZbp11rfoyeescusr9ha9qZ tmp]# vim /etc/zabbix/zabbix_agentd.conf
122 ServerActive=121.43.161.35
3.重啟zabbix-agent服務(wù),使得配置生效
[root@iZbp11rfoyeescusr9ha9qZ tmp]# /etc/init.d/zabbix-agentd restart
Shutting down Zabbix agent: [ OK ]
Starting Zabbix agent: [ OK ]
4.瀏覽器刷新頁面,發(fā)現(xiàn)server端已經(jīng)重新監(jiān)控到agent運(yùn)行狀況的數(shù)據(jù)了
Tips:
- 在troubleshouting查看服務(wù)日志的時(shí)候,可以將注意力集中在有顯示“fail”或者“Error”這類失敗的關(guān)鍵詞上,這樣可以快速排錯(cuò),找到問題的原因,而不必通篇閱讀所有的日志,極大的提高效率。
- 作為運(yùn)維工程師,腦袋儲(chǔ)存的信息可能比較多、雜,時(shí)而出現(xiàn)忘記了某個(gè)服務(wù)、配置文件的絕對(duì)路徑,如果記得文件或者目錄的完整名,可以使用“l(fā)ocate+文件名”命令來定位文件的絕對(duì)路徑,若是連文件名也記不大清了,沒關(guān)系,還可以用Linux平臺(tái)強(qiáng)大的搜索命令find,以全局查找的方式,通過星號(hào)來匹配到想要查找的文件的絕對(duì)路徑,例如:find / -name *agentd.conf (從/目錄開始,全局搜索以agentd結(jié)尾的.conf文件)。這些都是作為一名運(yùn)維工程師應(yīng)該具備的基本技能,而不必通過死記硬背的方式來記憶所有文件的絕對(duì)路徑。