Sophie

Sophie

distrib > Mageia > 7 > i586 > media > core-updates > by-pkgid > 89c3fdeb3510d1b81107cfc380c4506d > files > 504

shinken-2.4.3-2.1.mga7.noarch.rpm

.. _advanced/internals-monitoring:

================================
 Shinken internals monitoring
================================


Introduction
=============

Shinken is able to expose many internal metrics to a statsd server, allowing to monitor its performances and its operation. It may also be useful to troubleshoot issues.

The metrics export to statsd may be controlled through parameters explained in :ref:`advanced configuration <configuration/configmain-advanced#statsd>`.

The various metrics available in statsd are described in the sections below (the metric names do not mention the configurable prefix nor hostname).

Each metric is specified
- Its name
- Its metric type
- A description telling which information it represents
- The type it is linked to, which may be used to filter the metrics to send to statsd through the `statsd_types` global attribute.

Initial connection timings
==========================

Some services establish connections to continuously exchange data. The metrics below measure the time spent to establish them.

==================== ===== ============================================================ ====
con-init.poller      timer Connection time from broker to poller                        perf
con-init.reactionner timer Connection time from broker to reactionner                   perf
con-init.receiver    timer Connection time from broker to receiver                      perf
con-init.scheduler   timer Connection time from broker/poller/reactcionner to schedule  perf
==================== ===== ============================================================ ====

con-init.poller
  Time spent to establish initial session from the broker to the poller services.

con-init.reactionner
  Time spent to establish initial session from the broker to the reactionner services.

con-init.receiver
  Time spent to establish initial session from the broker to the receiver services.

con-init.scheduler
  Time spent to establish initial session from the broker (to get objects state), the poller and the reactionner (to get actions to execute) to the scheduler services.

Hook timings
============

The shinken services executes code in hooks on different conditions (on a particular point in the workflow, on a timer, on a particular event, ...). The hook events are forwarded to modules for them to execute actions on them. The metrics below expose the time spent to execute those hooks.

======================== ===== ============================================ ====
hook.early_configuration timer Time spent in the `early_configuration` hook perf
hook.get_new_actions     timer Time spent in the `get_new_actions` hook     perf
hook.late_configuration  timer Time spent in the `late_configuration` hook  perf
hook.load_retention      timer Time spent in the `load_retention` hook      perf
hook.read_configuration  timer Time spent in the `read_configuration` hook  perf
hook.save_retention      timer Time spent in the `save_retention` hook      perf
hook.tick                timer Time spent in the `tick` hook                perf
======================== ===== ============================================ ====

hook.early_configuration
  The `early_configuration` hook is executed in the Arbiter daemon after having read the raw configuration, and before starting the deeper parsing operation. This metric exposes the time spent by Arbiter modules to react to this hook.

hook.get_new_actions
  The `get_new_actions` hook is executed in the Scheduler daemon to get actions to execute from modules (actions may be checks, notifications or event handlers). This metric exposes the time spent by Scheduler modules to react to this hook.

hook.late_configuration
  The `late_configuration` hook is executed in the Arbiter daemon after the deeper configuration parsing, and before validating it's correct. This metric exposes the time spent by Arbiter modules to react to this hook.

hook.load_retention
  The `load_retention` hook is executed in the Scheduler and Broker daemons to load retention data using the registered retention module. This metric exposes the time spent by the Schedulers and Brokers to load their retention data.

hook.save_retention
  The `save_retention` hook is executed in the Scheduler and Broker daemons to save retention data using the registered retention module. This metric exposes the time spent by the Schedulers and Brokers to save their retention data.

hook.tick
  All daemons send `tick` event each time they finish to execute a cycle in their main loop. This metric exposes the time spent by modules to react to the `tick` event.

Http communication timings
==========================

Daemons exchange data using remote execution calls based on HTTP/REST APIs. Each of the communications between the daemons are measured (server side). Depending on the request type, various metrics are calculated.

============== ===== ============================================================ ====
http.*.aqulock timer Time spent waiting for a lock (if the operation requires it) perf
http.*.args    timer Time spent parsing the request and its parameters            perf
http.*.calling timer Time spent executing the required procedure                  perf
http.*.global  timer Total time spent to execute the remote procedure             perf
http.*.json    timer Time spent encoding the result                               perf
============== ===== ============================================================ ====

Scheduling metrics
==================

The scheduler operations are measured carefully. Each operation registered in the `recurrent_works` dictionary in the Scheduler daemon is measured its execution time, and the memory increase it generated. Those values are exposed using the metrics below.

====================================== ======= ================================================ =====
loop.*                                 timer   Time spent in the operation                      perf
loop.*.mem                             counter Memory usage evolution involved by the operation perf
core.scheduler.actions.queue           gauge   The actions queue size in the Scheduler          queue
core.scheduler.checks.havetoresolvedep gauge   The check queue size in state havetoresolvedep   queue
core.scheduler.checks.inpoller         gauge   The check queue size in state inpoller           queue
core.scheduler.checks.queue            gauge   The total check queue size in the scheduler      queue
core.scheduler.checks.scheduled        gauge   The check queue size in state scheduled          queue
core.scheduler.checks.timeout          gauge   The check queue size in state timeout            queue
core.scheduler.checks.waitconsume      gauge   The check queue size in state waitconsume        queue
core.scheduler.checks.waitdep          gauge   The check queue size in state waitdep            queue
core.scheduler.checks.zombie           gauge   The check queue size in state zombie             queue
====================================== ======= ================================================ =====

loop.*
  Time spent in a particular step in the scheduler workflow.

loop.*.mem
  The memory variation involved in a particular step in the scheduler workflow.

core.scheduler.actions.queue
  The notifications and eventhandlers queue to be consumed by the reactionners

core.scheduler.checks.havetoresolvedep
  The checks count having havetoresolvedep state in the scheduler. Those checks have dependent checks that have to be checked before taking any decision.

core.scheduler.checks.inpoller
  The checks count having inpoller state in the scheduler. Those checks have been got from by a poller, and the scheduler is waiting for its result.

core.scheduler.checks.queue
  The total queue size on the Scheduler (all states).

core.scheduler.checks.scheduled
  The checks count having scheduled state in the scheduler. Those checks have to be taken by a poller.

core.scheduler.checks.timeout
  The checks count having inpoller state in the scheduler. Those checks have been got from by a poller, and the result did not came in time.

core.scheduler.checks.waitconsume
  The checks count having waitconsume state in the scheduler. Those checks have been got from by a poller, the result came in time and has to be processed by the Scheduler.

core.scheduler.checks.waitconsume
  The checks count having waitdep state in the scheduler. Those checks have dependent checks which result is required.

core.scheduler.checks.zombie
  The checks count having zombie state in the scheduler. Those checks have been totally processed and may be deleted.

Broker specific metrics
=======================

The broker receives broks emitted by the other services to manage its internal representation of the infrastructure, and forwards broks to its modules for them to do the same. The time to manage its state. Those various operation are measured and exposed through the metrics below.

================================= ===== ================================ ====
core.broker.manage-brok           timer Time to manage a single brok     perf
core.broker.put-to-external-queue timer Time to forward broks to modules perf
core.broker.get-new-broks         timer Time to forward broks to modules perf
================================= ===== ================================ ====

core.broker.manage-brok
  When broks are received, they have to be decoded and integrated in the broker configuration to update its representation of the infrastructure. This metric measures the time spent to handle a single brok.

core.broker.put-to-external-queue
  External broker modules do not benefit from broker internal state representation, and have to decode broks to do the work on their own. This metric measures the time spent to forward all the received broks to all the external modules.

core.broker.get-new-broks
  Time to get new broks from other services.

Poller/Reactionner specific metrics
===================================

============================= ======= =========================================================== =====
core.*.manage-returns         timer   Time spent by a satellite to send results to scheduler      perf
core.*.wait-ratio             gauge   **To be documented**                                        perf
core.*.timeout                gauge   **To be documented**                                        perf
core.*.worker-fork.queue-size gauge   The checks/notifications/eventhandlers execution queue size queue
core.*.actions.in             counter The number of new actions got from scheduler                queue
core.*.actions.queue          gauge   The number actions currently queued                         queue
core.*.results.out            counter The number of results returned to scheduler                 queue
core.*.results.queue          gauge   The number results currently queued                         queue
============================= ======= =========================================================== =====

core.*.manage-returns
  Time spent by the poller or reactionners to return the execution results to the scheduler.

core.*.wait-ratio
  **To be documented**

core.*.timeout
  **To be documented**

core.*.worker-fork.queue-size
  The execution queue in the poller/reactionner.

core.*.worker-fork.queue-size
  The execution queue in the poller/reactionner.

core.*.actions.in
  The number of new actions got from scheduler.

core.*.actions.queue
 The number actions currently queued

core.*.results.out
  The number of results returned to scheduler.

core.*.results.queued
  The number results currently queued

Broks related metrics
=====================

Broks transit from the satellites to broker using a pull strategy. So broks are made available on the satellites, and the active Broker fetches them. The exception is the Arbiter that sends its broks directly to the broker. The broks queue on the satellites are monitored through the metrics below.

==================================== ======= ================================================== =====
core.broker.get-new-broks.poller     timer   Time spent to fetch broks from poller              perf
core.broker.get-new-broks.reactionnertimer   Time spent to fetch broks from reactionner         perf
core.broker.get-new-broks.receiver   timer   Time spent to fetch broks from receiver            perf
core.broker.get-new-broks.scheduler  timer   Time spent to fetch broks from scheduler           perf
core.arbiter.broks.in.broker         counter Broks received by the Arbiter from the broker      queue
core.arbiter.broks.in.poller         counter Broks received by the Arbiter from the poller      queue
core.arbiter.broks.in.reactionner    counter Broks received by the Arbiter from the reactionner queue
core.arbiter.broks.in.receiver       counter Broks received by the Arbiter from the receiver    queue
core.arbiter.broks.in.scheduler      counter Broks received by the Arbiter from the scheduler   queue
core.arbiter.broks.queue             counter The broks queue size on the Arbiter                queue
core.broker.broks.queue              counter The broks queue size on the Broker                 queue
core.poller.broks.queue              counter The broks queue size on the Poller                 queue
core.reactionner.broks.queue         counter The broks queue size on the Poller                 queue
core.receiver.broks.queue            counter The broks queue size on the Receiver               queue
core.scheduler.broks.queue           counter The broks queue size on the Receiver               queue
==================================== ======= ================================================== ======

core.broker.get-new-broks.poller
  Time spent by the Broker to download, decode and integrate broks downloaded from the Poller.

core.broker.get-new-broks.reactionner
  Time spent by the Broker to download, decode and integrate broks downloaded from the Reactionner.

core.broker.get-new-broks.receiver
  Time spent by the Broker to download, decode and integrate broks downloaded from the Receiver.

core.broker.get-new-broks.scheduler
  Time spent by the Broker to download, decode and integrate broks downloaded from the Scheduler.

core.broker.arbiter.broks.in.broker
  The broks downloaded by the Arbiter from the Scheduler.

core.arbiter.broks.in.poller
  The broks downloaded by the Arbiter from the Poller

core.arbiter.broks.in.reactionner
  The broks downloaded by the Arbiter from the Reactionner

core.arbiter.broks.in.receiver
  The broks downloaded by the Arbiter from the Receiver

core.arbiter.broks.in.scheduler
  The broks downloaded by the Arbiter from the Scheduler

core.arbiter.broks.queue
  The total broks queue size in the Arbiter

core.broker.broks.queue
  The total broks queue size in the Broker

core.poller.broks.queue
  The total broks queue size in the Poller

core.reactionner.broks.queue
  The total broks queue size in the Reactionner

core.receiver.broks.queue
  The total broks queue size in the Receiver

core.scheduler.broks.queue
  The total broks queue size in the Scheduler

External commands related metrics
=================================

The external commands may emitted by any service or module. They may also be externally received and transferred by the Arbiter or the Receiver. When an external command is emitted, it goes up to the Arbiter which is able to decide to which service it should be routed. Each service is monitorred its external command queue which are exosed by the metrics below.

======================================== ===== ===================================================== ======
core.arbiter.external-commands.queue     gauge The external commands queue length on the Arbiter     queue
core.broker.external-commands.queue      gauge The external commands queue length on the Broker      queue
core.poller.external-commands.queue      gauge The external commands queue length on the Poller      queue
core.reactionner.external-commands.queue gauge The external commands queue length on the Reactionner queue
core.receiver.external-commands.queue    gauge The external commands queue length on the Receiver    queue
core.scheduler.external-commands.queue   gauge The external commands queue length on the Scheduler   queue
======================================== ===== ===================================================== =====

core.arbiter.external-commands.queue
  The external commands queue length on the Arbiter

core.broker.external-commands.queue
  The external commands queue length on the Broker

core.poller.external-commands.queue
  The external commands queue length on the Poller

core.reactionner.external-commands.queue
  The external commands queue length on the Reactionner

core.receiver.external-commands.queue
  The external commands queue length on the Receiver

core.scheduler.external-commands.queue
  The external commands queue length on the Scheduler

Memory related metrics
======================

All services are monitored their memory usage through the metrics below.

==================== ===== ================================================ ======
core.arbiter.mem     gauge The total memory used by the Arbiter service     system
core.broker.mem      gauge The total memory used by the Broker service      system
core.poller.mem      gauge The total memory used by the Poller service      system
core.reactionner.mem gauge The total memory used by the Reactionner service system
core.receiver.mem    gauge The total memory used by the Receiver service    system
core.scheduler.mem   gauge The total memory used by the Scheduler service   system
==================== ===== ================================================ ======

core.arbiter.mem
  The total memory used by the Arbiter service

core.broker.mem
  The total memory used by the Broker service

core.poller.mem
  The total memory used by the Poller service

core.reactionner.mem
  The total memory used by the Reactionner service

core.receiver.mem
  The total memory used by the Receiver service

core.scheduler.mem
  The total memory used by the Scheduler service

Managed objects
===============

Service that hold configuration objects are monitored the objects they manage through the metrics below. Note the the Arbiter holds the whole confugiration, but the schedulers may havo only a portion of it if multible active schedulers are used.

============================ ===== =========================================================== =====
core.arbiter.commands        gauge The number of Command objects managed by the Arbiter        queue
core.arbiter.contactgroups   gauge The number of Contactgroup objects managed by the Arbiter   queue
core.arbiter.contacts        gauge The number of Contact objects managed by the Arbiter        queue
core.arbiter.hostgroups      gauge The number of Hostgroup objects managed by the Arbiter      queue
core.arbiter.hosts           gauge The number of Host objects managed by the Arbiter           queue
core.arbiter.servicegroups   gauge The number of Servicegroup objects managed by the Arbiter   queue
core.arbiter.services        gauge The number of Service objects managed by the Arbiter        queue
core.scheduler.commands      gauge The number of Command objects managed by the Scheduler      queue
core.scheduler.contactgroups gauge The number of Contactgroup objects managed by the Scheduler queue
core.scheduler.contacts      gauge The number of Contact objects managed by the Scheduler      queue
core.scheduler.hostgroups    gauge The number of Hostgroup objects managed by the Scheduler    queue
core.scheduler.hosts         gauge The number of Host objects managed by the Scheduler         queue
core.scheduler.servicegroups gauge The number of Servicegroup objects managed by the Scheduler queue
core.scheduler.services      gauge The number of Service objects managed by the Scheduler      queue
============================= ===== ========================================================== =====

core.arbiter.commands
  The number of Command objects managed by the Arbiter

core.arbiter.contactgroups
  The number of Contactgroup objects managed by the Arbiter

core.arbiter.contacts
  The number of Contact objects managed by the Arbiter

core.arbiter.hostgroups
  The number of Hostgroup objects managed by the Arbiter

core.arbiter.hosts
  The number of Host objects managed by the Arbiter

core.arbiter.servicegroups
  The number of Servicegroup objects managed by the Arbiter

core.arbiter.services
  The number of Service objects managed by the Arbiter

core.scheduler.commands
  The number of Command objects managed by the Scheduler

core.scheduler.contactgroups
  The number of Contactgroup objects managed by the Scheduler

core.scheduler.contacts
  The number of Contact objects managed by the Scheduler

core.scheduler.hostgroups
  The number of Hostgroup objects managed by the Scheduler

core.scheduler.hosts
  The number of Host objects managed by the Scheduler

core.scheduler.servicegroups
  The number of Servicegroup objects managed by the Scheduler

core.scheduler.services
  The number of Service objects managed by the Scheduler