How to use Nagios Core¶

Nagios is a system for remotely monitoring servers. It comes in two parts -- Nagios Core (covered in this HOWTO) which provides a front-end and database for server logging information, and NPRE (covered here) which is a collection of agents which gather information from servers, and feed them back to Core.

We currently have a Nagios Core installed on bc-monitor and if you need access to it, please ask the CTO for the user name and password.

Installing Nagios¶

Firstly, install the relevant dependencies, if they not already installed:

Nagios3
apache2
php5
Openssl

On a Ubuntu machine you can do this with apt-get:

apt-get install nagios3 apache2 php5 openssl

In the installation wizard, select internet site:

Replace the example hostname with your hostname:

After clicking next, you will be taken to this screen where you will need to set up a password:

Restart the Nagios service:

service nagios restart

Ensure that Apache is running, then open your browser at http://localhost:8080/nagios3. You can change the localhost to the IP and the port to the port your server is using (if necessary).

Configuring Nagios¶

Adding a new service¶

Firstly, open the service configuration file /etc/nagios3/conf.d/services_nagios2.cfg in a text editor. Then define a new service using the keywords define service (see the example below).

Add the necessary properties to each object. Some properties are mandatory whilst others are optional. This page has more information on objects. The check command (in this case check_ssh) calls the command that checks this service. Also, ensure that the host_group is defined by checking /etc/nagios3/conf.d/hostgroups_nagios2.cfg, otherwise Nagios will fail to restart.

define service {
          hostgroup_name          ssh-servers
          service_description     SSH
          check_command           check_ssh
          use                     generic-service
          notification_interval   120 ; set > 0 if you want to be renotified
          contact_groups          admins
}

If you have finished, restart Nagios:

service nagios restart

Adding alerts¶

To get email alerts, you first have to be a member of a group. A new member can be added to the /etc/nagios3/conf.d/contacts_nagios2.cfg. This is done using define contact and then the object properties are set inside the braces.

define contact {
        contact_name                    root
        alias                           Root
        service_notification_period     24x7
        host_notification_period        24x7
        service_notification_options    w,u,c,r
        host_notification_options       d,r
        service_notification_commands   notify-service-by-email
        host_notification_commands      notify-host-by-email
        email                           NAME@DOMAIN.com
}

You then have to define a group in the same file (/etc/nagios3/conf.d/contacts_nagios2.cfg) and add members to that group:

define contactgroup {
        contactgroup_name       admins
        alias                   Nagios Administrators
        members                 root
}

After, you can navigate to the services and set the contact_groups property in the chosen service scope to the name you've added to the contact group. Those who are apart of the group will then be emailed alerts.

Service fields¶

These fields are compulsory and have to be apart of all service definitions:

Field	Use
host_name	specify the short name(s) of the host(s) that the service "runs" on or is associated with. Multiple hosts should be separated by commas.
service_description	define the description of the service, which may contain spaces, dashes, and colons (semicolons, apostrophes, and quotation marks should be avoided). No two services associated with the same host can have the same description. Services are uniquely identified with their host_name and service_description directives.
max_check_attempts	define the number of times that Nagios will retry the service check command if it returns any state other than an OK state. Setting this value to 1 will cause Nagios to generate an alert without retrying the service check again.
check_interval	define the number of "time units" to wait before scheduling the next "regular" check of the service. "Regular" checks are those that occur when the service is in an OK state or when the service is in a non-OK state, but has already been rechecked
retry_interval	define the number of "time units" to wait before scheduling a re-check of the service. Services are rescheduled at the retry interval when they have changed to a non-OK state. Once the service has been retried max_check_attempts times without a change in its status, it will revert to being scheduled at its "normal" rate as defined by the check_interval value. Unless you've changed the interval_length directive from the default value of 60, this number will mean minutes. More information on this value can be found in the check scheduling documentation.
check_period	specify the short name of the time period during which active checks of this service can be made.
notification_interval	define the number of "time units" to wait before re-notifying a contact that this service is still in a non-OK state. Unless you've changed the interval_length directive from the default value of 60, this number will mean minutes. If you set this value to 0, Nagios will not re-notify contacts about problems for this service - only one problem notification will be sent out.
notification_period	specify the short name of the time period services during which notifications of events for this service can be sent out to contacts. No service notifications will be sent out during times which is not covered by the time period.
contacts	This is a list of the short names of the contacts that should be notified whenever there are problems (or recoveries) with this service. Multiple contacts should be separated by commas. Useful if you want notifications to go to just a few people and don't want to configure contact groups. You must specify at least one contact or contact group in each service definition.
contact_groups	This is a list of the short names of the contact groups that should be notified whenever there are problems (or recoveries) with this service. Multiple contact groups should be separated by commas. You must specify at least one contact or contact group in each service definition.

Contact fields¶

These fields are compulsory and have to be apart of all service definitions:

Field	Use
contact_name	define a short name used to identify the contact. It is referenced in contact group definitions. Under the right circumstances, the $CONTACTNAME$ macro will contain this value.
host_notifications_enabled	determine whether or not the contact will receive notifications about host problems and recoveries. Values: 0 = don't send notifications, 1 = send notifications.
service_notifications_enabled	determine whether or not the contact will receive notifications about host Definition problems and recoveries. Values: 0 = don't send notifications, 1 = send notifications.
host_notification_period	specify the short name of the time period during which the contact can be notified about host problems or recoveries. You can think of this as an "on call" time for host notifications for the contact. Read the documentation on time periods for more information on how this works and potential problems that may result from improper use.
service_notification_period	specify the short name of the time period during which the contact can be notified about service problems or recoveries. You can think of this as an "on call" time for service notifications for the contact. Read the documentation on time periods for more information on how this works and potential problems that may result from improper use.
host_notification_options	define the host states for which notifications can be sent out to this contact. Valid options are a combination of one or more of the following: d = notify on DOWN host statThere's also fields that are not compulsory but can be useful and they can be found here.es, u = notify on UNREACHABLE host states, r = notify on host recoveries (UP stahis directtes), f = notify when the host starts and stops flapping, and s = send notifications when host or service scheduled downtime starts and ends. If you specify n (none) as an option, the contact will not receive any type of host notifications.
host_notification_commands	define a list of the short names of the commands used to notify the contact of a host problem or recovery. Multiple notification commands should be separated by commas. All notification commands are executed when the contact needs to be notified. The maximum amount of time that a notification command can run is controlled by the notification_timeout option.
service_notification_commands	define a list of the short names of the commands used to notify the contact of a service problem or recovery. Multiple notification commands should be separated by commas. All notification commands are executed when the contact needs to be notified. The maximum amount of time that a notification command can run is controlled by the notification_timeout option.

Optional fields¶

There are also fields that are not compulsory but can be useful. These can be found here.

Accessing Nagios remotely¶

Accessing Nagios remotely is fairly straightforward. If you simply enter the hosts IP address, colon, port followed by /nagios3, you should be taken the Nagios page, e.g.: http://127.0.0.1:8080/nagios3 For security purposes, you will be prompted to enter your user name and password.

If you do not know your IP address, you can find it with:

ipconfig (Windows)
ifconfig (Ubuntu)

Diagnosing an issue¶

Generating histograms¶

Histograms will give you a visual representation of the state of a service over a period of time, which can be useful in diagnosing defects.

To create a histogram, first go to the histogram creation page:

Select whether you want generate a histogram for a host or service. Then configure the options so that they satisfy your requirements:

After you finish configuring, the graph should be generated:

Gathering more information¶

If you click on a service, you will be taken to the service information page where a summary will be displayed. However, you also get the ability to take a closer look using the links on the top left of the page:

View Information For This Host
View Status Detail For This Host
View Alert History For This Service
View Trends For This Service
View Alert Histogram For This Service
View Availability Report For This Service
View Notifications For This Service