Overview Cluster Monitoring

Monitoring services are managed using Services > Monitoring in EXAoperation. You can create and configure monitoring services, and you can set the warning and error threshold values for disk usage, swap usage and load.

EXACluster Monitoring Services

You can create log services that include cluster events such as:

  • EXAoperation events
  • Database events
  • EXAStorage events
  • Load on cluster nodes
  • Cluster process events
  • Authentication events

Additionally, you can also forward the log service messages to other monitoring systems.

The following table describes the properties of a log service.

Property Description
Minimum Log Priority

The minimum syslog severity that should be included in the log service. The available log priorities (and their meanings) are:

  • Error (an error has occurred)
  • Warning (an error could occur if no action is taken)
  • Notice (unusual events have occurred, but there is no error)
  • Information (all events are reported)

You specify the minimum severity that the log service will report on - for example, if you select 'Warning', only warning or error events will be shown, but no information or notice events.

EXAClusterOS Services
  • The services to include in the log service.

    EXAoperation - information about cluster processes, such as the boot process of a node
  • DWAd - information about EXASolution instances, such as startup of a database
  • Lockd - required for DWAd
  • Load - events regarding CPU load on every server
  • Storage - information about the state of the EXAStorage layer
  • Authentication - information about authentication events such as failed login attempts into EXAoperation, failed SSH login attempts
Database Systems The databases that should be reported on by the log service.
Remote Syslog Server The IP address or DNS name of the remote server if you forward the log service to an external server.
Remote Syslog Protocol The network protocol to use if you forward the log service to an external server.
Default Time Interval The default period of time for which the log service will display events. Enter a value followed by a unit of time (e.g. '1d', '2h', '15m', '100s')
Description A description of the log service.

The best practice is to have at least two log services with a minimum log priority of Information, one that includes LOAD events, and one that excludes LOAD events. This is because the volume of LOAD events could obscure reporting from other events. The setup for both log services can be:

INFORMATION (WITH LOAD)

  • Minimum Log Priority: Information
  • EXAClusterOS Services: ALL
  • Database Systems: ALL
  • Default Time Interval: 10m
  • Description: Information ALL

INFORMATION (WITHOUT LOAD)

  • Minimum Log Priority: Information
  • EXAClusterOS Services: ALL except
  • LOAD Database Systems: ALL
  • Default Time Interval: 10m
  • Description: Information ALL

Threshold Values

You can set error and warning threshold values for the following:

Threshold Description
Disk usage The warning and error threshold values for disk usage, expressed as a percentage. Disk usage thresholds apply only to filesystems such as OS and DATA, and not to EXAStorage.
Swap usage The warning and error threshold values for swap usage.
Load

The warning and error threshold values for CPU load in the cluster.

A good starting point for setting the load threshold is to use the following calculation: Quantity of Threads per data node * 1.5 = Warning Threshold.

For example, in a scenario where each data node has 2 x sockets with 6 x cores and hyperthreading, the calculation would be:

2 x 6 x 2 = 24 Threads -> 24*1.5 = 36 Warning Threshold

Service States

Service states provides an overview of the cluster services.

During the installation process, all services will be shown as 'OK' except for Storage. This is because EXAStorage does not start automatically, and is not yet configured.

Even though the time is automatically synchronized across the cluster at regular intervals, you can choose to manually force a synchronization using the Synchronize Time button, see Synchronize Time with NTP Server (optional)