Handling Nagios Client Monitoring System: Difference between revisions

From NEOSYS Technical Support Wiki
Jump to navigationJump to search
 
(15 intermediate revisions by 3 users not shown)
Line 1: Line 1:
= Procedure to handle Nagios =
==Procedure to handle Nagios==


The procedure that support staff need to follow while handling Nagios is documented under [[Procedures#Handling_Nagios_Client_Monitoring_system| Procedures: Handling Nagios Client Monitoring System]]
The procedure that support staff need to follow while handling Nagios is documented under [[Procedures#Handling_Nagios_Client_Monitoring_system| Procedures: Handling Nagios Client Monitoring System]]
Line 5: Line 5:
Nagios is accessed via this link: http://monitor.neosys.com/nagios3
Nagios is accessed via this link: http://monitor.neosys.com/nagios3


= Nagios services =
==Nagios services==


Nagios is configured to display information pertaining to all NEOSYS client's server statuses which include multiple services such as:
Nagios is configured to display information pertaining to all NEOSYS client's server statuses which include multiple services such as:
# HTTPS: Most of NEOSYS clients are configured to have external web access via secure HTTP protocol (port 4430) from outside office. Nagios is configured to check port 4430 on a regular interval of 10 minutes and display any issues in accessing the same.
 
# SSH: As part of the support contract, NEOSYS should have external secure access to the client server usually over port 19580. Nagios is configured to check this port on a regular interval of 10 minutes and display any issues in accessing the same.
#HTTPS: Most of NEOSYS clients are configured to have external web access via secure HTTP protocol (port 4430) from outside office. Nagios is configured to check port 4430 on a regular interval of 10 minutes and display any issues in accessing the same.
# Ping: Nagios is also configured to ping the client router as a measure to check if router responds incase the NEOSYS server is down.
#SSH: As part of the support contract, NEOSYS should have external secure access to the client server usually over port 19580. Nagios is configured to check this port on a regular interval of 10 minutes and display any issues in accessing the same.
# NEOSYS: This service works in a reverse direction, and the NEOSYS installation on the client server sends information such as databases running, current backup status, internal and internet IP addressess etc to Nagios on a regular interval of 10 minutes.
#Ping: Nagios is also configured to ping the client router as a measure to check if router responds incase the NEOSYS server is down.
#NEOSYS: This service works in a reverse direction, and the NEOSYS installation on the client server sends information such as databases running, current backup status, internal and internet IP addressess etc to Nagios on a regular interval of 10 minutes.


Some key information about Nagios is as follows:
Some key information about Nagios is as follows:
* Nagios is also configured to display information related to internal servers.
* Clients hosted on a NEOSYS cloud server might not have services such as SSH or PING as this is monitored as part of the internal server service.
* Nagios sends out email alerts to support2@neosys.com (which is forwarded to support@neosys.com) from 8 am to 12 midnight on all Dubai working days (Sun-Thu). No alerts are sent out on Fri and Sat, unless they are for NEOSYS internal servers.


= How to handle a service error =
*Nagios is also configured to display information related to internal servers.
# Nagios Service Info - get there via various routes eg from Service Problems - then click on the service name (not the host name)
*Clients hosted on a NEOSYS cloud server might not have services such as SSH or PING as this is monitored as part of the internal server service.
# Service Commands, Acknowledge this service problem (only services with status Warning or Critical have this option)
*Nagios sends out email alerts to support2@neosys.com (which is forwarded to support@neosys.com) from 8 am to 12 midnight on all Dubai working days (Sun-Thu). No alerts are sent out on Fri and Sat, unless they are for NEOSYS internal servers.
# Enter a note - explaining to yourself and your co-workers explaining how the problem is being handled and when to follow up
 
Nagios Configuration files:
 
*Commands to check different services. ~/etc/nagios3/commands.cfg
*Contacts where support2@neosys.com is configured. Emails from nagios will be sent to the email ID configured here. /etc/nagios3/conf.d.backups/contacts.cfg
*For Oman Client. Unclear of purpose.  /etc/nagios3/conf.d.backups/dtme.cfg
*Fully commented file of generic hostgroup definitions. /etc/nagios3/conf.d.backups/hostgroups_nagios2.cfg
*Host group definition. /etc/nagios3/conf.d.backups/linux.cfg
*Files where all the neosys hosts and hostgroups are defined. /etc/nagios3/conf.d.backups/neosysclients.cfg
*Config for monitor.neosys.com. /etc/nagios3/conf.d.backups/monitor.cfg
 
==How to handle a service error==
 
#Nagios Service Info - get there via various routes eg from Service Problems - then click on the service name (not the host name)
#Service Commands, Acknowledge this service problem (only services with status Warning or Critical have this option)
#Enter a note - explaining to yourself and your co-workers explaining how the problem is being handled and when to follow up


Notifications will be automatically resumed once the service becomes OK again.
Notifications will be automatically resumed once the service becomes OK again.
Line 27: Line 40:
The "Disable notifications" is not quite the same and shows as red on tactical summary screen.
The "Disable notifications" is not quite the same and shows as red on tactical summary screen.


===[[Backup_and_Restore#Updating_Nagios_incase_of_failures| Updating Nagios in case of backup failures]]===
====[[Backup_and_Restore#Updating_Nagios_incase_of_failures| Updating Nagios in case of backup failures]]====


=== How to stop ALL notifications ===
====How to stop ALL notifications====


Useful to stop a massive number of alerts due to various causes.
Useful to stop a massive number of alerts due to various causes.
Line 36: Line 49:
#Enable/Disable notifications
#Enable/Disable notifications


=== Speeding up Nagios web interface ===
====Speeding up Nagios web interface====


The usual F5 to refresh before the automatic 90 second refresh works but Ctrl+F5 doesnt.
The usual F5 to refresh before the automatic 90 second refresh works but Ctrl+F5 doesnt.


=== Speeding up NEOSYS process checkins ===
====Speeding up NEOSYS process checkins====


You can force a neosys service checkin from NEOSYS maintenance mode (any process/database) press F5
You can force a neosys service checkin from NEOSYS maintenance mode (any process/database) press F5
Line 46: Line 59:
  MONITOR2
  MONITOR2


=== [http://itwiki.neosys.com/index.php/Setting_up_monitoring_in_Nagios#Adding_a_new_NEOSYS_installation_.28new_client.29 Adding the client to Nagios] ===
====[http://itwiki.neosys.com/index.php/Setting_up_monitoring_in_Nagios#Adding_a_new_NEOSYS_installation_.28new_client.29 Adding the client to Nagios]====


= Troubleshooting NAGIOS generally =
==Troubleshooting NAGIOS generally==
==Fixing “CRITICAL – Socket timeout after 10 secs” error message on NAGIOS==
===Fixing “CRITICAL – Socket timeout after 10 secs” error message on NAGIOS===


===Error Message===
====Error Message====


[[image:Vm3nagios.jpg]]
[[image:Vm3nagios.jpg]]


===Problem===
====Problem====


NAGIOS is not updating services like CPU Load, Drive Space C:, Drive Save D:,Explorer, Memory Usage etc.
NAGIOS is not updating services like CPU Load, Drive Space C:, Drive Save D:,Explorer, Memory Usage etc.


=== Solution ===
====Solution====


Open Windows Task Manager and kill any nscp.exe process. Then, restart NSClient++ from the desktop or by going to Start> Programs> NSClient++
Open Windows Task Manager and kill any nscp.exe process. Then, restart NSClient++ from the desktop or by going to Start> Programs> NSClient++
These steps can be carried out even when users are active.
These steps can be carried out even when users are active.


== Fixing “NEOSYS has not checked in” error message==
===Fixing “NEOSYS has not checked in” error message===


=== Error Message Explained ===
====Error Message Explained====
You notice a problem on NAGIOS indicating that '''''NEOSYS not checked in''' on a particular client server. This happens because NEOSYS is not updating NAGIOS.


=== Possible Causes & Solutions ===
Nagios reports this error when it is is not able to update the status of NEOSYS for a particular client server.
#The maintenance window is left open. Make sure the maintenance window has not been left open in the server.
#Hung process on server. e.g.:- Fatal Error in Rev Restart. Follow steps in troubleshooting [[Troubleshooting_NEOSYS_Generally#Troubleshooting_Hung_processes| hung process]].
====The NEOSYS process is NOT running on the server====


Start the NEOSYS process and wait for 10 mins for NEOSYS to check into NAGIOS.
====Possible Causes & Solutions====
<BR>


====The NEOSYS process IS running on the server but still cannot connect to NAGIOS====
=====The maintenance window is left open=====


NEOSYS connects to NAGIOS using http. NEOSYS automatically detects and uses any http proxy configuration configured in Internet Explorer. If the Internet Explorer in the server can reach the internet then NEOSYS should be able to update to NAGIOS via the same proxy.
Make sure that there is no maintenance window left open in the server.


First check if Internet Explorer in the server can reach NAGIOS. Type the following link into the Internet Explorer in the server:
=====Hung process on server=====


New versions of NEOSYS (as of 25/7/2017) need the server to be able reach:
Check if there are any hung processes on the server e.g. Fatal Error in Rev Restart. Follow steps in troubleshooting [[Troubleshooting_NEOSYS_Generally#Troubleshooting_Hung_processes| hung process]].


https://monitor.hosts.neosys.com:4428/monitor.php
=====The NEOSYS process IS NOT running on the server=====


Older versions of NEOSYS need the server to be able to reach:
If there are no NEOSYS processes running, then start the NEOSYS process and wait for NEOSYS to check into Nagios, or force Nagios to re-check the status of NEOSYS service.


http://monitor.hosts.neosys.com/monitor.php
=====The NEOSYS process IS running on the server but still cannot connect to NAGIOS=====


i.e. outbound port 4428 is required instead of port 80 on later versions on NEOSYS.
NEOSYS connects to Nagios using http and automatically detects and uses any http proxy configuration configured in Internet Explorer.  


If you are asked to login then the Internet Explorer is working OK. You do not need to login. Just cancel and move on to the next step.
If Internet Explorer in the server can reach the internet then NEOSYS should be able to update to NAGIOS via the same proxy.


If Internet Explorer CANNOT connect to NAGIOS then resolving that fundamental issue will probably solve the NAGIOS connection issue too.
Check if Internet Explorer can reach Nagios by using the below link. The check is successful if you get the Nagios login request shown in the image below.
https://monitor.hosts.neosys.com:4428
 
[[image:nagios_access.png]]
 
======If Internet Explorer CANNOT connect to Nagios======
 
The client's IT must enable outbound access preferably on all ports or at least 4428 to our NEOSYS server at monitor.hosts.neosys.com.
 
If outbound access to monitor.hosts.neosys.com on port 4428 has not been enabled, then Nagios will not be able to update the status of the host and will show the error "NEOSYS not checked in".
 
If port 4428 cannot be used, then Support can manually configure the network to use standard port 443 in the [http://userwiki.neosys.com/index.php/System_Configuration_File#Monitor_Port_No. system configuration file.]
 
This requires that the client's IP number is added to the list of allowed IP numbers on nl10r router since Nagios https server gets multiple probes a day on port 443 if it is left open to everyone.
 
======If Internet Explorer CAN connect to Nagios======


View the Internet Explorer proxy configuration as follows:
View the Internet Explorer proxy configuration as follows:
Line 101: Line 124:
[[image:ieproxy.png]]
[[image:ieproxy.png]]


If Internet Explorer CAN connect to NAGIOS then check if there is an issue with NEOSYS's http proxy server configuration as follows:
<br>Check if there is an issue with NEOSYS' http proxy server configuration as follows:


#Search for '''UPDATE.$WG''' file located in the neosys\neosys. folder and open it using notepad or wordpad.
#Open the UPDATE.$WG or UPDATE.xWG file located in the neosys\neosys or hosts/xxxxxxxx/work folder.
#You should find a message similar to the following:  
#You should find a message similar to the following:
<pre>
<pre>
Connecting to 192.168.100.145:8080 failed: No such file or directory.
Connecting to 192.168.100.145:8080 failed: No such file or directory.
</pre>
</pre>
Where the above appears to be some non-functional http proxy server ip/port number and is not the expected nagios server ip number.
Where the above appears to be some non-functional http proxy server ip/port number and is not the expected nagios server ip number.
'''A windows proxy command shows the same ip and port:'''
A windows proxy command shows the same ip and port:


On Windows 2003/XP
On Windows 2003/XP
Line 137: Line 160:
'''Solution 1 - Remove the above setting to create a direct connection'''
'''Solution 1 - Remove the above setting to create a direct connection'''


#To remove the registry entries that ProxyCfg.exe creates,you must delete the WinHttpSettings value from the following registry key:<br><PRE>HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\Internet Settings\Connections\WinHttpSettings</PRE>
#To remove the registry entries that ProxyCfg.exe creates,you must delete the WinHttpSettings value from the following registry key:<br><pre>HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\Internet Settings\Connections\WinHttpSettings</pre>
#After you do the above, confirm that the proxy details are deleted by running the proxycfg command <BR>
#After you do the above, confirm that the proxy details are deleted by running the proxycfg command <BR>
#Next, restart the NEOSYS processes for the changes to be affected <BR>
#Next, restart the NEOSYS processes for the changes to be affected <BR>


'''Solution 2 - Configure a functioning proxy ip/port number'''
'''Solution 2 - Configure a functioning proxy ip/port number'''
#Use the proxycfg command to enter a working proxy ip/port number/exclusion list: http://msdn.microsoft.com/en-us/library/aa384069%28VS.85%29.aspx
#Use the proxycfg command to enter a working proxy ip/port number/exclusion list: http://msdn.microsoft.com/en-us/library/aa384069%28VS.85%29.aspx
#After you do the above, confirm that the new proxy details are working by running the proxycfg command again
#After you do the above, confirm that the new proxy details are working by running the proxycfg command again
Line 148: Line 172:
Note: Please refer to the following link before you restart NEOSYS processes: [[Administering_NEOSYS_Server#Closing_NEOSYS_Services|Closing NEOSYS Services]]
Note: Please refer to the following link before you restart NEOSYS processes: [[Administering_NEOSYS_Server#Closing_NEOSYS_Services|Closing NEOSYS Services]]


On the client server, look at the text of UPDATE.$WG and other UPDATE.* files in the client's NEOSYS installation neosys/neosys folder for clues.
On the client server, look at the text of UPDATE.$WG or UPDATE.xWG and other UPDATE.* files in the client's NEOSYS installation neosys/neosys or hosts/xxxxxxxx/work folder for clues.


==== There is a problem with the USB media inserted for backup====
=====There is a problem with the USB media inserted for backup=====
Refer [[Troubleshooting_NEOSYS_Generally#Error_message:_.E2.80.9CAbort.2C_Retry.2C_Fail.E2.80.9D|here]]
Refer [[Troubleshooting_NEOSYS_Generally#Error_message:_.E2.80.9CAbort.2C_Retry.2C_Fail.E2.80.9D|here]]


==== NEOSYS thinks it sees a new neosys2.exe upgrade file on the location http://www.neosys.com/support/neosys2.exe and attempts to download it ====
=====NEOSYS thinks it sees a new neosys2.exe upgrade file on the location http://www.neosys.com/support/neosys2.exe and attempts to download it=====


Refer [[Troubleshooting_NEOSYS_Generally#NEOSYS_process_window_displays_message_.22Upgrade_Downloading.22 | here]]
Refer [[Troubleshooting_NEOSYS_Generally#NEOSYS_process_window_displays_message_.22Upgrade_Downloading.22 | here]]


==== Client network is blocking most outgoing ports including default port 4428 ====
===Fixing "Cannot make SSL connection" error===
 
====Error Message====
If a network appears to be blocking most ports including default port 4428, NEOSYS will not be able to update Nagios normally.
 
In that case, you can manually configure the network to use standard port 443 in the [http://userwiki.neosys.com/index.php/System_Configuration_File#Monitor_Port_No. system configuration file.]
 
This requires that the client's IP number is added to the list of allowed IP numbers on nl10r router since Nagios https server gets multiple probes a day on port 443 if it is left open to everyone.
 
== Fixing "Cannot make SSL connection" error==
===Error Message===
[[File:SSL.jpg]]
[[File:SSL.jpg]]


Line 172: Line 188:
TODO Add screenshot
TODO Add screenshot


===Cause===
====Cause====
When there are multiple HTTPS sites on one server, any subsequent SSL self certifications seems to destroy all other sites with self certification where the export/REMOVE/import step is not done for some reason. See [http://techwiki.neosys.com/index.php/Setting_up_HTTPS#Export.2C_Remove_and_Import_Certificates Export, Remove and Import Step]
When there are multiple HTTPS sites on one server, any subsequent SSL self certifications seems to destroy all other sites with self certification where the export/REMOVE/import step is not done for some reason. See [http://techwiki.neosys.com/index.php/Setting_up_HTTPS#Export.2C_Remove_and_Import_Certificates Export, Remove and Import Step]


This issue is only evident after the server restart.
This issue is only evident after the server restart.


===Solution===
====Solution====
Re-install certificates. See [http://techwiki.neosys.com/index.php/Setting_up_HTTPS#Re-installing_Certificates Re-installing Certificates]
Re-install certificates. See [http://techwiki.neosys.com/index.php/Setting_up_HTTPS#Re-installing_Certificates Re-installing Certificates]


== Fixing "check_ssl_certificates" error==
====Fixing "check_ssl_certificates" error====


===Cause===
=====Cause=====
This error appears when the current ssl certificate to which website binding is done is no longer valid.
This error appears when the current ssl certificate to which website binding is done is no longer valid.


===Solution===
=====Solution=====
Use the latest valid ssl certificate and redo website binding. Refer to link [[Setting_up_HTTPS|Setting up HTTPS]]
Use the latest valid ssl certificate and redo website binding. Refer to link [[Setting_up_HTTPS|Setting up HTTPS]]


== Backup -> Impossible alert ==
====Backup -> Impossible alert====
===Possible Causes and Solutions===
=====Possible Causes and Solutions=====
If there is an error "Backup->Impossible" on Nagios check if the USB is properly inserted and schedule downtime to Nagios for 2 hours.
If there is an error "Backup->Impossible" on Nagios check if the USB is properly inserted and schedule downtime to Nagios for 2 hours.


==[[Backup_and_Restore#Interchange_backup_USB_mail_reminder| "Change Backup" alert]] ==
====[[Backup_and_Restore#Interchange_backup_USB_mail_reminder| "Change Backup" alert]]====


== Troubleshooting Network Outages reflected in Nagios due to reassigning of router name or IP address ==
===Troubleshooting Network Outages reflected in Nagios due to reassigning of router name or IP address===
Nagios displays errors if a router name or the ip address it is monitoring have been reassigned.  
Nagios displays errors if a router name or the ip address it is monitoring have been reassigned.  


Line 200: Line 216:


Steps:-
Steps:-
#Login to Nagios
#Login to Nagios
#Click on Tactical Overview -> Network Outages and click on Blocking Outages to view <br><br>[[image:tracert-00.jpg]]<br><br>
#Click on Tactical Overview -> Network Outages and click on Blocking Outages to view <br><br>[[image:tracert-00.jpg]]<br><br>
Line 212: Line 229:
#You can now login to zoneedit and update the ip address of the host.
#You can now login to zoneedit and update the ip address of the host.
#Check Nagios.
#Check Nagios.
== Nagios reports a hung process ==
 
===Possible Causes and Solutions===
====Nagios reports a hung process====
=====Possible Causes and Solutions=====
Refer link [[Troubleshooting_NEOSYS_Generally#Error_message:_.22Read_error_in_the_operating_system_file.22|here]]
Refer link [[Troubleshooting_NEOSYS_Generally#Error_message:_.22Read_error_in_the_operating_system_file.22|here]]


==Explorer.exe not running ==
====Explorer.exe not running====
Nagios will display this error for only Win3 at the moment.  
Nagios will display this error for only Win3 at the moment.  


===Possible Causes and Solutions===
=====Possible Causes and Solutions=====
This error means that the server has (for whatever reasons) rebooted and stuck at the Windows login prompt for someone to enter the username & password. (More info on explorer.exe is available at http://en.wikipedia.org/wiki/Explorer.exe)
This error means that the server has (for whatever reasons) rebooted and stuck at the Windows login prompt for someone to enter the username & password. (More info on explorer.exe is available at http://en.wikipedia.org/wiki/Explorer.exe)


Solution to this problem would be to login via Tunnelier and open up Remote Desktop Connection.
Solution to this problem would be to login via Tunnelier and open up Remote Desktop Connection.


=Configuring Sonicwall firewall to allow NEOSYS to update Nagios=
==Configuring Sonicwall firewall to allow NEOSYS to update Nagios==
=== Configuring Sonicwall firewall to allow NEOSYS to update Nagios ===
====Configuring Sonicwall firewall to allow NEOSYS to update Nagios====


This is documented at [[Sonicwall_Firewall_Configuration#Configuring_Sonicwall_firewall_to_allow_NEOSYS_to_update_Nagios|Configuring Sonicwall firewall to allow NEOSYS to update Nagios]]
This is documented at [[Sonicwall_Firewall_Configuration#Configuring_Sonicwall_firewall_to_allow_NEOSYS_to_update_Nagios|Configuring Sonicwall firewall to allow NEOSYS to update Nagios]]
==Counting current active users==
NEOSYS gives an estimate of the number of users currently active by counting how
many users have been seen to be active, even once, within the last hour.
This can give a sense of the processing requirements for an installation.
The numbers can be seen in nagios installations screen from support.htm
Example:
  Users: 5/4/2 Max: 7/5/2
Users means:
*5 unique browser session ids seen in the last hour
*4 unique user codes seen in the last hour
*2 Unique IP Nos seen in the last hour (60 mins)
Max means the maximums of the above seen in any one of the last 24 hours.
In practice, the middle figure, count of unique user codes, is very
close to the real number of people active within an hour. However,
since the same user code can be used by different people, even on
different computers (although not at the same time) it could
technically be an underestimate.
The number of currently active users is usually much lower than the number of
registered users. This is because some registered users use the system
infrequently and many dont use the system all the time.
The total number of different user codes seen to be active over the
whole day is not shown. For that, you can see Usage Statistics.

Latest revision as of 05:07, 6 June 2021

Procedure to handle Nagios

The procedure that support staff need to follow while handling Nagios is documented under Procedures: Handling Nagios Client Monitoring System

Nagios is accessed via this link: http://monitor.neosys.com/nagios3

Nagios services

Nagios is configured to display information pertaining to all NEOSYS client's server statuses which include multiple services such as:

  1. HTTPS: Most of NEOSYS clients are configured to have external web access via secure HTTP protocol (port 4430) from outside office. Nagios is configured to check port 4430 on a regular interval of 10 minutes and display any issues in accessing the same.
  2. SSH: As part of the support contract, NEOSYS should have external secure access to the client server usually over port 19580. Nagios is configured to check this port on a regular interval of 10 minutes and display any issues in accessing the same.
  3. Ping: Nagios is also configured to ping the client router as a measure to check if router responds incase the NEOSYS server is down.
  4. NEOSYS: This service works in a reverse direction, and the NEOSYS installation on the client server sends information such as databases running, current backup status, internal and internet IP addressess etc to Nagios on a regular interval of 10 minutes.

Some key information about Nagios is as follows:

  • Nagios is also configured to display information related to internal servers.
  • Clients hosted on a NEOSYS cloud server might not have services such as SSH or PING as this is monitored as part of the internal server service.
  • Nagios sends out email alerts to support2@neosys.com (which is forwarded to support@neosys.com) from 8 am to 12 midnight on all Dubai working days (Sun-Thu). No alerts are sent out on Fri and Sat, unless they are for NEOSYS internal servers.

Nagios Configuration files:

  • Commands to check different services. ~/etc/nagios3/commands.cfg
  • Contacts where support2@neosys.com is configured. Emails from nagios will be sent to the email ID configured here. /etc/nagios3/conf.d.backups/contacts.cfg
  • For Oman Client. Unclear of purpose. /etc/nagios3/conf.d.backups/dtme.cfg
  • Fully commented file of generic hostgroup definitions. /etc/nagios3/conf.d.backups/hostgroups_nagios2.cfg
  • Host group definition. /etc/nagios3/conf.d.backups/linux.cfg
  • Files where all the neosys hosts and hostgroups are defined. /etc/nagios3/conf.d.backups/neosysclients.cfg
  • Config for monitor.neosys.com. /etc/nagios3/conf.d.backups/monitor.cfg

How to handle a service error

  1. Nagios Service Info - get there via various routes eg from Service Problems - then click on the service name (not the host name)
  2. Service Commands, Acknowledge this service problem (only services with status Warning or Critical have this option)
  3. Enter a note - explaining to yourself and your co-workers explaining how the problem is being handled and when to follow up

Notifications will be automatically resumed once the service becomes OK again.

The "Disable notifications" is not quite the same and shows as red on tactical summary screen.

Updating Nagios in case of backup failures

How to stop ALL notifications

Useful to stop a massive number of alerts due to various causes.

  1. Nagios Process Info
  2. Enable/Disable notifications

Speeding up Nagios web interface

The usual F5 to refresh before the automatic 90 second refresh works but Ctrl+F5 doesnt.

Speeding up NEOSYS process checkins

You can force a neosys service checkin from NEOSYS maintenance mode (any process/database) press F5

MONITOR2

Adding the client to Nagios

Troubleshooting NAGIOS generally

Fixing “CRITICAL – Socket timeout after 10 secs” error message on NAGIOS

Error Message

Vm3nagios.jpg

Problem

NAGIOS is not updating services like CPU Load, Drive Space C:, Drive Save D:,Explorer, Memory Usage etc.

Solution

Open Windows Task Manager and kill any nscp.exe process. Then, restart NSClient++ from the desktop or by going to Start> Programs> NSClient++ These steps can be carried out even when users are active.

Fixing “NEOSYS has not checked in” error message

Error Message Explained

Nagios reports this error when it is is not able to update the status of NEOSYS for a particular client server.

Possible Causes & Solutions

The maintenance window is left open

Make sure that there is no maintenance window left open in the server.

Hung process on server

Check if there are any hung processes on the server e.g. Fatal Error in Rev Restart. Follow steps in troubleshooting hung process.

The NEOSYS process IS NOT running on the server

If there are no NEOSYS processes running, then start the NEOSYS process and wait for NEOSYS to check into Nagios, or force Nagios to re-check the status of NEOSYS service.

The NEOSYS process IS running on the server but still cannot connect to NAGIOS

NEOSYS connects to Nagios using http and automatically detects and uses any http proxy configuration configured in Internet Explorer.

If Internet Explorer in the server can reach the internet then NEOSYS should be able to update to NAGIOS via the same proxy.

Check if Internet Explorer can reach Nagios by using the below link. The check is successful if you get the Nagios login request shown in the image below.

https://monitor.hosts.neosys.com:4428

Nagios access.png

If Internet Explorer CANNOT connect to Nagios

The client's IT must enable outbound access preferably on all ports or at least 4428 to our NEOSYS server at monitor.hosts.neosys.com.

If outbound access to monitor.hosts.neosys.com on port 4428 has not been enabled, then Nagios will not be able to update the status of the host and will show the error "NEOSYS not checked in".

If port 4428 cannot be used, then Support can manually configure the network to use standard port 443 in the system configuration file.

This requires that the client's IP number is added to the list of allowed IP numbers on nl10r router since Nagios https server gets multiple probes a day on port 443 if it is left open to everyone.

If Internet Explorer CAN connect to Nagios

View the Internet Explorer proxy configuration as follows:

Ieproxy.png


Check if there is an issue with NEOSYS' http proxy server configuration as follows:

  1. Open the UPDATE.$WG or UPDATE.xWG file located in the neosys\neosys or hosts/xxxxxxxx/work folder.
  2. You should find a message similar to the following:
Connecting to 192.168.100.145:8080 failed: No such file or directory.

Where the above appears to be some non-functional http proxy server ip/port number and is not the expected nagios server ip number. A windows proxy command shows the same ip and port:

On Windows 2003/XP

proxycfg

On Windows 2008/Win7

NetSH WinHTTP import Proxy ie

Output:

Microsoft (R) WinHTTP Default Proxy Configuration Tool
Copyright (c) Microsoft Corporation. All rights reserved.

Current WinHTTP proxy settings under:
  HKEY_LOCAL_MACHINE\
    SOFTWARE\Microsoft\Windows\CurrentVersion\Internet Settings\Connections\
      WinHttpSettings :

    Proxy Server(s) :  192.168.100.145:8080
    Bypass List     :  192.168.*.*;localhost;<local>

Solution 1 - Remove the above setting to create a direct connection

  1. To remove the registry entries that ProxyCfg.exe creates,you must delete the WinHttpSettings value from the following registry key:
    HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\Internet Settings\Connections\WinHttpSettings
  2. After you do the above, confirm that the proxy details are deleted by running the proxycfg command
  3. Next, restart the NEOSYS processes for the changes to be affected

Solution 2 - Configure a functioning proxy ip/port number

  1. Use the proxycfg command to enter a working proxy ip/port number/exclusion list: http://msdn.microsoft.com/en-us/library/aa384069%28VS.85%29.aspx
  2. After you do the above, confirm that the new proxy details are working by running the proxycfg command again
  3. Next, restart the NEOSYS processes for the changes to be affected

Note: Please refer to the following link before you restart NEOSYS processes: Closing NEOSYS Services

On the client server, look at the text of UPDATE.$WG or UPDATE.xWG and other UPDATE.* files in the client's NEOSYS installation neosys/neosys or hosts/xxxxxxxx/work folder for clues.

There is a problem with the USB media inserted for backup

Refer here

NEOSYS thinks it sees a new neosys2.exe upgrade file on the location http://www.neosys.com/support/neosys2.exe and attempts to download it

Refer here

Fixing "Cannot make SSL connection" error

Error Message

SSL.jpg

Users get the message "This page cannot be displayed" when they try to access the HTTPS website. TODO Add screenshot

Cause

When there are multiple HTTPS sites on one server, any subsequent SSL self certifications seems to destroy all other sites with self certification where the export/REMOVE/import step is not done for some reason. See Export, Remove and Import Step

This issue is only evident after the server restart.

Solution

Re-install certificates. See Re-installing Certificates

Fixing "check_ssl_certificates" error

Cause

This error appears when the current ssl certificate to which website binding is done is no longer valid.

Solution

Use the latest valid ssl certificate and redo website binding. Refer to link Setting up HTTPS

Backup -> Impossible alert

Possible Causes and Solutions

If there is an error "Backup->Impossible" on Nagios check if the USB is properly inserted and schedule downtime to Nagios for 2 hours.

"Change Backup" alert

Troubleshooting Network Outages reflected in Nagios due to reassigning of router name or IP address

Nagios displays errors if a router name or the ip address it is monitoring have been reassigned.

We can resolve this issue by trying to find the ISP router ip address just before the NEOSYS server.

Steps:-

  1. Login to Nagios
  2. Click on Tactical Overview -> Network Outages and click on Blocking Outages to view

    Tracert-00.jpg

  3. You will now see the host/ISP which is down. Click on the status map icon to identify the host associated with the ISP, as shown below:

    Tracert-01.jpg

  4. From the Network Map displayed, identify the host associated with the ISP.

    Tracert-02.jpg

  5. If you already know the ip address of the host then skip to next step else, in Nagios, click on Host Detail, then on the hostname identified earlier and From the Host Details shown, save the host url.

    Tracert-03.jpg

    Tracert-04.jpg

  6. Log onto www.network-tools.com:
    • Select Trace
    • Enter the host's ip address if already known or host url
    • Click on Go

      Tracert-05.jpg

  7. The trace route should complete successfully revealing the IP address of the ISP just before the NEOSYS server.

    Tracert-06.jpg

  8. You can now login to zoneedit and update the ip address of the host.
  9. Check Nagios.

Nagios reports a hung process

Possible Causes and Solutions

Refer link here

Explorer.exe not running

Nagios will display this error for only Win3 at the moment.

Possible Causes and Solutions

This error means that the server has (for whatever reasons) rebooted and stuck at the Windows login prompt for someone to enter the username & password. (More info on explorer.exe is available at http://en.wikipedia.org/wiki/Explorer.exe)

Solution to this problem would be to login via Tunnelier and open up Remote Desktop Connection.

Configuring Sonicwall firewall to allow NEOSYS to update Nagios

Configuring Sonicwall firewall to allow NEOSYS to update Nagios

This is documented at Configuring Sonicwall firewall to allow NEOSYS to update Nagios

Counting current active users

NEOSYS gives an estimate of the number of users currently active by counting how many users have been seen to be active, even once, within the last hour.

This can give a sense of the processing requirements for an installation.

The numbers can be seen in nagios installations screen from support.htm

Example:

 Users: 5/4/2 Max: 7/5/2

Users means:

  • 5 unique browser session ids seen in the last hour
  • 4 unique user codes seen in the last hour
  • 2 Unique IP Nos seen in the last hour (60 mins)

Max means the maximums of the above seen in any one of the last 24 hours.

In practice, the middle figure, count of unique user codes, is very close to the real number of people active within an hour. However, since the same user code can be used by different people, even on different computers (although not at the same time) it could technically be an underestimate.

The number of currently active users is usually much lower than the number of registered users. This is because some registered users use the system infrequently and many dont use the system all the time.

The total number of different user codes seen to be active over the whole day is not shown. For that, you can see Usage Statistics.