How to use Nagios Core¶
Nagios is a system for remotely monitoring servers. It comes in two parts -- Nagios Core (covered in this HOWTO) which provides a front-end and database for server logging information, and NPRE (covered here) which is a collection of agents which gather information from servers, and feed them back to Core.
We currently have a Nagios Core installed on bc-monitor and if you need access to it, please ask the CTO for the user name and password.
Installing Nagios¶
Firstly, install the relevant dependencies, if they not already installed:
- Nagios3
- apache2
- php5
- Openssl
On a Ubuntu machine you can do this with apt-get
:
apt-get install nagios3 apache2 php5 openssl
In the installation wizard, select internet site:
Replace the example hostname with your hostname:
After clicking next, you will be taken to this screen where you will need to set up a password:
Restart the Nagios service:
service nagios restart
Ensure that Apache is running, then open your browser at http://localhost:8080/nagios3. You can change the localhost to the IP and the port to the port your server is using (if necessary).
Configuring Nagios¶
Adding a new service¶
Firstly, open the service configuration file /etc/nagios3/conf.d/services_nagios2.cfg
in a text editor.
Then define a new service using the keywords define service
(see the example below).
Add the necessary properties to each object.
Some properties are mandatory whilst others are optional.
This page has more information on objects.
The check command (in this case check_ssh
) calls the command that checks this service.
Also, ensure that the host_group
is defined by checking /etc/nagios3/conf.d/hostgroups_nagios2.cfg
, otherwise Nagios will fail to restart.
define service {
hostgroup_name ssh-servers
service_description SSH
check_command check_ssh
use generic-service
notification_interval 120 ; set > 0 if you want to be renotified
contact_groups admins
}
If you have finished, restart Nagios:
service nagios restart
Adding alerts¶
To get email alerts, you first have to be a member of a group.
A new member can be added to the /etc/nagios3/conf.d/contacts_nagios2.cfg
.
This is done using define contact
and then the object properties are set inside the braces.
define contact {
contact_name root
alias Root
service_notification_period 24x7
host_notification_period 24x7
service_notification_options w,u,c,r
host_notification_options d,r
service_notification_commands notify-service-by-email
host_notification_commands notify-host-by-email
email NAME@DOMAIN.com
}
You then have to define a group in the same file (/etc/nagios3/conf.d/contacts_nagios2.cfg
) and add members to that group:
define contactgroup {
contactgroup_name admins
alias Nagios Administrators
members root
}
After, you can navigate to the services and set the contact_groups
property in the chosen service scope to the name you've added to the contact group.
Those who are apart of the group will then be emailed alerts.
Service fields¶
These fields are compulsory and have to be apart of all service definitions:
Field | Use |
---|---|
host_name | specify the short name(s) of the host(s) that the service "runs" on or is associated with. Multiple hosts should be separated by commas. |
service_description | define the description of the service, which may contain spaces, dashes, and colons (semicolons, apostrophes, and quotation marks should be avoided). No two services associated with the same host can have the same description. Services are uniquely identified with their host_name and service_description directives. |
max_check_attempts | define the number of times that Nagios will retry the service check command if it returns any state other than an OK state. Setting this value to 1 will cause Nagios to generate an alert without retrying the service check again. |
check_interval | define the number of "time units" to wait before scheduling the next "regular" check of the service. "Regular" checks are those that occur when the service is in an OK state or when the service is in a non-OK state, but has already been rechecked |
retry_interval | define the number of "time units" to wait before scheduling a re-check of the service. Services are rescheduled at the retry interval when they have changed to a non-OK state. Once the service has been retried max_check_attempts times without a change in its status, it will revert to being scheduled at its "normal" rate as defined by the check_interval value. Unless you've changed the interval_length directive from the default value of 60, this number will mean minutes. More information on this value can be found in the check scheduling documentation. |
check_period | specify the short name of the time period during which active checks of this service can be made. |
notification_interval | define the number of "time units" to wait before re-notifying a contact that this service is still in a non-OK state. Unless you've changed the interval_length directive from the default value of 60, this number will mean minutes. If you set this value to 0, Nagios will not re-notify contacts about problems for this service - only one problem notification will be sent out. |
notification_period | specify the short name of the time period services during which notifications of events for this service can be sent out to contacts. No service notifications will be sent out during times which is not covered by the time period. |
contacts | This is a list of the short names of the contacts that should be notified whenever there are problems (or recoveries) with this service. Multiple contacts should be separated by commas. Useful if you want notifications to go to just a few people and don't want to configure contact groups. You must specify at least one contact or contact group in each service definition. |
contact_groups | This is a list of the short names of the contact groups that should be notified whenever there are problems (or recoveries) with this service. Multiple contact groups should be separated by commas. You must specify at least one contact or contact group in each service definition. |
Contact fields¶
These fields are compulsory and have to be apart of all service definitions:
Field | Use |
---|---|
contact_name | define a short name used to identify the contact. It is referenced in contact group definitions. Under the right circumstances, the $CONTACTNAME$ macro will contain this value. |
host_notifications_enabled | determine whether or not the contact will receive notifications about host problems and recoveries. Values: 0 = don't send notifications, 1 = send notifications. |
service_notifications_enabled | determine whether or not the contact will receive notifications about host Definition problems and recoveries. Values: 0 = don't send notifications, 1 = send notifications. |
host_notification_period | specify the short name of the time period during which the contact can be notified about host problems or recoveries. You can think of this as an "on call" time for host notifications for the contact. Read the documentation on time periods for more information on how this works and potential problems that may result from improper use. |
service_notification_period | specify the short name of the time period during which the contact can be notified about service problems or recoveries. You can think of this as an "on call" time for service notifications for the contact. Read the documentation on time periods for more information on how this works and potential problems that may result from improper use. |
host_notification_options | define the host states for which notifications can be sent out to this contact. Valid options are a combination of one or more of the following: d = notify on DOWN host statThere's also fields that are not compulsory but can be useful and they can be found here.es, u = notify on UNREACHABLE host states, r = notify on host recoveries (UP stahis directtes), f = notify when the host starts and stops flapping, and s = send notifications when host or service scheduled downtime starts and ends. If you specify n (none) as an option, the contact will not receive any type of host notifications. |
host_notification_commands | define a list of the short names of the commands used to notify the contact of a host problem or recovery. Multiple notification commands should be separated by commas. All notification commands are executed when the contact needs to be notified. The maximum amount of time that a notification command can run is controlled by the notification_timeout option. |
service_notification_commands | define a list of the short names of the commands used to notify the contact of a service problem or recovery. Multiple notification commands should be separated by commas. All notification commands are executed when the contact needs to be notified. The maximum amount of time that a notification command can run is controlled by the notification_timeout option. |
Optional fields¶
There are also fields that are not compulsory but can be useful. These can be found here.
Accessing Nagios remotely¶
Accessing Nagios remotely is fairly straightforward.
If you simply enter the hosts IP address, colon, port followed by /nagios3
, you should be taken the Nagios page, e.g.: http://127.0.0.1:8080/nagios3
For security purposes, you will be prompted to enter your user name and password.
If you do not know your IP address, you can find it with:
ipconfig
(Windows)ifconfig
(Ubuntu)
Diagnosing an issue¶
Generating histograms¶
Histograms will give you a visual representation of the state of a service over a period of time, which can be useful in diagnosing defects.
To create a histogram, first go to the histogram creation page:
Select whether you want generate a histogram for a host or service. Then configure the options so that they satisfy your requirements:
After you finish configuring, the graph should be generated:
Gathering more information¶
If you click on a service, you will be taken to the service information page where a summary will be displayed. However, you also get the ability to take a closer look using the links on the top left of the page:
View Information For This Host
View Status Detail For This Host
View Alert History For This Service
View Trends For This Service
View Alert Histogram For This Service
View Availability Report For This Service
View Notifications For This Service