Lithos submits metrics via a cantal-compatible protocol.
All metrics usually belong to lithos’s cgroup, so for example in graphite
you can find them under
Or you cand find them without this prefix in
http://hostname:22682/local/process_metrics without a prefix.
In the following description we skip the common prefix and only show metric names.
Metrics of lithos master process:
master.restarts(counter) amount of restarts of a master process. Usually restart equals to configuration reload via
lithos_switchor any other way.
master.sandboxes(gauge) number of sandboxes configured
master.containers(gauge) number of containers (processes) conigured
master.queue(gauge) length of the internal queue, the queue consists of processes to run and hanging processes to kill
processes.<sandbox_name>.<process_name>.started– (counter) number of times process have been started
processes.<sandbox_name>.<process_name>.deaths– (counter) number of times process have exited for any reason
processes.<sandbox_name>.<process_name>.failures– (counter) number of times process have exited for failure reason, for whatever reason lithos thinks it was failure. See Determining Failure
processes.<sandbox_name>.<process_name>.running– (gauge) number of procesess that are currently running (was started but not yet found to be exited)
Global metrics for all sandboxes and containers:
containers.started– (counter) same as for
processes.*but for all containers
containers.deaths– (counter) see above
containers.failures– (counter) see above
containers.running– (gauge) see above
containers.unknown– (gauge) number of child processes of lithos that are found to be running but do not belong to any of the process groups known to lithos (they are being killed, and they are probably from deleted configs)
Currently there are two kinds of process death that are considered non-failures:
- Processes that had been sent
SIGTERMsignal to (with any exit status) or ones dead on
SIGTERMsignal are considered non-failed.
- Processes exited with one of the exit codes specified in