Handling Nagios Client Monitoring System: Difference between revisions

From NEOSYS Technical Support Wiki
Jump to navigationJump to search
No edit summary
Line 104: Line 104:


On the client server, look at the text of UPDATE.$WG and other UPDATE.* files in the client's NEOSYS installation neosys/neosys folder for clues.
On the client server, look at the text of UPDATE.$WG and other UPDATE.* files in the client's NEOSYS installation neosys/neosys folder for clues.
===== There is a problem with the USB media inserted for backup=====
==== There is a problem with the USB media inserted for backup====
Refer to the below wiki link:
Refer to the below wiki link:
http://techwiki.neosys.com/index.php/Troubleshooting_NEOSYS_Generally#Solving_NEOSYS_process_hung_due_to_.E2.80.9CAbort.2C_Retry.2C_Fail.E2.80.9D
http://techwiki.neosys.com/index.php/Troubleshooting_NEOSYS_Generally#Solving_NEOSYS_process_hung_due_to_.E2.80.9CAbort.2C_Retry.2C_Fail.E2.80.9D
http://techwiki.neosys.com/index.php/Troubleshooting_NEOSYS_Generally#Error_message:_.E2.80.9CAbort.2C_Retry.2C_Fail.E2.80.9D
http://techwiki.neosys.com/index.php/Troubleshooting_NEOSYS_Generally#Error_message:_.E2.80.9CAbort.2C_Retry.2C_Fail.E2.80.9D
=== Troubleshooting Network Outages reflected in Nagios due to reassigning of router name or IP address ===
=== Troubleshooting Network Outages reflected in Nagios due to reassigning of router name or IP address ===
Nagios displays errors if a router name or the ip address it is monitoring have been reassigned.  
Nagios displays errors if a router name or the ip address it is monitoring have been reassigned.  

Revision as of 08:50, 13 February 2014

http://monitor.neosys.com/nagios3

How to handle a service error

  1. Nagios Service Info - get there via various routes eg from Service Problems - then click on the service name (not the host name)
  2. Service Commands, Acknowledge this service problem (only services with status Warning or Critical have this option)
  3. Enter a note - explaining to yourself and your co-workers explaining how the problem is being handled and when to follow up

Notifications will be automatically resumed once the service becomes OK again.

The "Disable notifications" is not quite the same and shows as red on tactical summary screen.

Troubleshooting NAGIOS generally

Resolving “CRITICAL – Socket timeout after 10 secs” error message on NAGIOS

Error Message

Vm3nagios.jpg

Problem

NAGIOS is not updating services like CPU Load, Drive Space C:, Drive Save D:,Explorer, Memory Usage etc.

Solution

Open Windows Task Manager and kill any nscp.exe process. Then, restart NSClient++ from the desktop or by going to Start> Programs> NSClient++

Resolving “NEOSYS has not checked in” error message

Error Message Explained

You notice a problem on NAGIOS indicating that NEOSYS not checked in on a particular client server. This happens because NEOSYS is not updating NAGIOS.

Possible Causes & Solutions

The maintenance window is left open

Make sure the maintenance window has not been left open in the server.

The NEOSYS process is NOT running on the server

Start the NEOSYS process and wait for 10 mins for NEOSYS to check into NAGIOS.

The NEOSYS process is running on the server but still cannot connect to NAGIOS

NEOSYS connects to NAGIOS using http. NEOSYS automatically detects and uses any http proxy configuration configured in Internet Explorer. If the Internet Explorer in the server can reach the internet then NEOSYS should be able to update to NAGIOS via the same proxy.

First check if Internet Explorer in the server can reach NAGIOS. Type the following link into the Internet Explorer in the server:

http://monitor.neosys.com

If you are asked to login then the Internet Explorer is working OK. You do not need to login. Just cancel and move on to the next step.

If Internet Explorer CANNOT connect to NAGIOS then resolving that fundamental issue will probably solve the NAGIOS connection issue too.

View the Internet Explorer proxy configuration as follows:

Ieproxy.png

If Internet Explorer CAN connect to NAGIOS then check if there is an issue with NEOSYS's http proxy server configuration as follows:

  1. Search for UPDATE.$WG file located in the neosys\neosys. folder and open it using notepad or wordpad.
  2. You should find a message similar to the following:
Connecting to 192.168.100.145:8080 failed: No such file or directory.

Where the above appears to be some non-functional http proxy server ip/port number and is not the expected nagios server ip number. A windows proxy command shows the same ip and port:

On Windows 2003/XP

proxycfg

On Windows 2008/Win7

NetSH WinHTTP import Proxy ie

Output:

Microsoft (R) WinHTTP Default Proxy Configuration Tool
Copyright (c) Microsoft Corporation. All rights reserved.

Current WinHTTP proxy settings under:
  HKEY_LOCAL_MACHINE\
    SOFTWARE\Microsoft\Windows\CurrentVersion\Internet Settings\Connections\
      WinHttpSettings :

    Proxy Server(s) :  192.168.100.145:8080
    Bypass List     :  192.168.*.*;localhost;<local>

Solution 1 - Remove the above setting to create a direct connection

  1. To remove the registry entries that ProxyCfg.exe creates,you must delete the WinHttpSettings value from the following registry key:
    HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\Internet Settings\Connections\WinHttpSettings
  2. After you do the above, confirm that the proxy details are deleted by running the proxycfg command
  3. Next, restart the NEOSYS processes for the changes to be affected


Solution 2 - Configure a functioning proxy ip/port number

  1. Use the proxycfg command to enter a working proxy ip/port number/exclusion list: http://msdn.microsoft.com/en-us/library/aa384069%28VS.85%29.aspx
  2. After you do the above, confirm that the new proxy details are working by running the proxycfg command again
  3. Next, restart the NEOSYS processes for the changes to be affected


Note: Please refer to the link before you restart NEOSYS processes [Hidden Processes]

On the client server, look at the text of UPDATE.$WG and other UPDATE.* files in the client's NEOSYS installation neosys/neosys folder for clues.

There is a problem with the USB media inserted for backup

Refer to the below wiki link: http://techwiki.neosys.com/index.php/Troubleshooting_NEOSYS_Generally#Solving_NEOSYS_process_hung_due_to_.E2.80.9CAbort.2C_Retry.2C_Fail.E2.80.9D http://techwiki.neosys.com/index.php/Troubleshooting_NEOSYS_Generally#Error_message:_.E2.80.9CAbort.2C_Retry.2C_Fail.E2.80.9D

Troubleshooting Network Outages reflected in Nagios due to reassigning of router name or IP address

Nagios displays errors if a router name or the ip address it is monitoring have been reassigned.

We can resolve this issue by trying to find the ISP router ip address just before the NEOSYS server.

Steps:-

  1. Login to Nagios
  2. Click on Tactical Overview -> Network Outages and click on Blocking Outages to view

    Tracert-00.jpg

  3. You will now see the host/ISP which is down. Click on the status map icon to identify the host associated with the ISP, as shown below:

    Tracert-01.jpg

  4. From the Network Map displayed, identify the host associated with the ISP.

    Tracert-02.jpg

  5. If you already know the ip address of the host then skip to next step else, in Nagios, click on Host Detail, then on the hostname identified earlier and From the Host Details shown, save the host url.

    Tracert-03.jpg

    Tracert-04.jpg

  6. Log onto www.network-tools.com:
    • Select Trace
    • Enter the host's ip address if already known or host url
    • Click on Go

      Tracert-05.jpg

  7. The trace route should complete successfully revealing the IP address of the ISP just before the NEOSYS server.

    Tracert-06.jpg

  8. You can now login to zoneedit and update the ip address of the host.
  9. Check Nagios.

Nagios reports a hung process

Possible Causes and Solutions

Refer link http://techwiki.neosys.com/index.php/Troubleshooting_NEOSYS_Generally#Error_message:_.22Read_error_in_the_operating_system_file.22

Backup -> Impossible

Possible Causes and Solutions

If there is an error "Backup->Impossible" on Nagios check if the USB is properly inserted and schedule downtime to Nagios for 2hours.