9.3.1.6. show health

A - Introduction

The `health` command of the `show` sub-group enables displaying statistics and the health information of the GCap.


B - Prerequisites

  • User: setup, gviewadm

  • Dependencies: N/A


C - Command

`show health`


D - Procedure

The command prompt is displayed.

(gcap-cli)
  1. Enter the command

    show health
    
  2. Validate
    The system displays the following information
  • `block` counters - Mass storage statistics

  • `cpu_stats` counters - Processor statistics

  • `disks` counters - Mount point occupancy statistics

  • `emergency` counters - GCap emergency mode information

  • `gcenter` counters - Paired GCenter information

  • `high_availability` counters - High Availability (HA) information

  • `interfaces` counters - Statistics on network interfaces

  • `loadavg` counters - Statistics on the average load of the GCap

  • `meminfo` counters - Statistics on the RAM

  • `numastat` counters - Non Uniform Memory Access (NUMA) node

  • `quotas` counters - Quota Information

  • `sofnet` counters – Statistics on received packets according to processor cores

  • `suricata` counters - Sigflow (monitoring-engine) information

  • `systemd` counters - System initialization information

  • `uptime` counters - Uptime

  • `virtualmemory` counters - Swap space information (swap)


E - Details of `block` counters - Mass storage statistics

  • `sdN` - Disk statistics N where N is a letter of the alphabet

    • `read_bytes` - Bytes read since startup

    • `written_bytes` - Bytes written since startup

Example :

{
 "block": {
    "sda": {
        "read_bytes": 302867968,
        "written_bytes": 4837645312
    },
    "sdb": {
        "read_bytes": 3894272,
        "written_bytes": 4096
    }
}

F - Details of `cpu_stats` counter - CPU statistics

  • `cpus` - CPU usage statistics

    • `cpu` - Overall core usage statistics

    • `cpuX` - CPU X core statistics
      • `idle` - Elapsed time doing nothing in milliseconds

      • `iowait` - Elapsed time waiting for disk operations in milliseconds

      • `irq` - Elapsed time on material IRQs

      • `nice` - Time elapsed in user space on low priority processes in milliseconds

      • `softirq` - Elapsed time on hardware IRQs in milliseconds

      • `system` - Elapsed time in kernel space in milliseconds

      • `user` - Elapsed time in user space in milliseconds

      • `interrupts` - Number of interrupts since startup

      • `processes_blocked` - Number of blocked or dead processes

      • `processes_running` - Number of running processes

Example :

"cpu_stats": {
   "cpus": {
       "cpu": {
           "idle": 961816208,
           "iowait": 11419,
           "irq": 0,
           "nice": 0,
           "softirq": 397899,
           "system": 21788203,
           "user": 50806194
       },
       "cpu0": {
           "idle": 79960857,
           "iowait": 985,
           "irq": 0,
           "nice": 0,
           "softirq": 234748,
           "system": 1795880,
           "user": 4357374
       },
       "cpu1": {
           "idle": 80166571,
           "iowait": 951,
           "irq": 0,
           "nice": 0,
           "softirq": 88078,
           "system": 1830370,
           "user": 4138182
       }
   },
   "interrupts": 12942835029,
   "processes_blocked": 0,
   "processes_running": 1
}

G- Details of `disks` counters - Mount point occupancy statistics

  • `/mountpoint/path` - Mount point path

    • `block_free` - Number of free blocks

    • `block_total` - Total number of blocks

    • `inode_free` - Number of remaining inodes

    • `inode_total` - Total number of inodes

Example :

"disks": {
    "/": {
        "block_free": 247909,
        "block_total": 249830,
        "inode_free": 64258,
        "inode_total": 65536
    },
    "/data": {
        "block_free": 7150076,
        "block_total": 7161801,
        "inode_free": 1827417,
        "inode_total": 1827840
    },
}

H - Details of `emergency` counters - GCap emergency mode information

  • `emergency_active` - Active or inactive status of the emergency mode

Example :

"emergency": {
    "emergency_active": false
},

I - Details of `GCenter` counters - Paired GCenter information

  • `chronyc_sync` - Status of the NTP synchronization with the GCenter

  • `Reachable` - GCenter reachable (true) or not (false)

Example :

"gcenter": {
    "chronyc_sync": false,
    "reachable": false
},

J - Details of `high_availability` counters - High Availability (HA) information

This feature is deprecated.
These counters are not significant.
  • `healthy` - HA health status

  • `last_status` - Last known HA status

  • `last_transition` - Date of last known HA status change in ISO8601 format

  • `leader` - True for a GCap leader, false for a GCap follower

  • `Status` - Active or inactive (false) status of the HA

Example :

"high_availability": {
    "healthy": false,
    "last_status": -1,
    "last_transition": "0001-01-01T00:00:00Z",
    "leader": false,
    "status": false
},

K - Details of `interfaces` counters - Statistics on network interfaces

  • `mon0`: network interface name

    • `rx_bytes` - Number of bytes received

    • `rx_drop` - Number of bytes lost in reception

    • `rx_errs` - Number of invalid bytes received

    • `rx_packets` - Total number of packets received from this interface

    • `tx_bytes` - Number of bytes sent

    • `tx_drop` - Number of bytes lost while sending

    • `tx_errs` - Number of invalid bytes sent

    • `tx_packets` - Total number of packets sent from this interface

Example :

"interfaces": {
    "mon0": {
        "rx_bytes": 0,
        "rx_drops": 0,
        "rx_errs": 0,
        "rx_packets": 0,
        "tx_bytes": 0,
        "tx_drops": 0,
        "tx_errs": 0,
        "tx_packets": 0
    },
    "tunnel": {
        "rx_bytes": 138433006,
        "rx_drops": 82901,
        "rx_errs": 0,
        "rx_packets": 2143236,
        "tx_bytes": 796294,
        "tx_drops": 0,
        "tx_errs": 0,
        "tx_packets": 3635
    },
    "management": {
        "rx_bytes": 137642525,
        "rx_drops": 82902,
        "rx_errs": 0,
        "rx_packets": 2135060,
        "tx_bytes": 0,
        "tx_drops": 0,
        "tx_errs": 0,
        "tx_packets": 0
    }
}

Note

Here the interfaces are named with the labels (`mon0`, `tunnel`, `management`).
Reminder: in role management-tunnel, the interface displayed is called `management`.

L - Details of `loadavg` counters - Statistics on the average load of the GCap

  • `active_processes` - Number of processes started

  • `load_average_15_mins` - Average load over the last fifteen minutes

  • `load_average_1_min` - Average load over the last minute

  • `load_average_5_mins` - Average load over the last five minutes

  • `running_processes` - Number of running processes

Example :

"loadavg": {
    "active_processes": 561,
    "load_average_15_mins": 0.99,
    "load_average_1_min": 0.67,
    "load_average_5_mins": 1,
    "running_processes": 2
}

M - Details of `meminfo` counters - Statistics on the RAM

  • `available` - Total physical memory in kilo-Bytes

  • `buffers` - Memory used by disk operations in kilo-Bytes

  • `cached` - Memory used by the cache in kilo-Bytes

  • `dirty` - Memory used by pending write operations in kilo-Bytes

  • `free` - Unused memory in kilo-Bytes

  • `hugepages_anonymous` - Number of anonymous transparent huge pages used

  • `hugepages_free` - Number of available transparent huge pages

  • `hugepages_reserved` - Number of reserved transparent huge pages

  • `hugepages_shmem` - Number of shared transparent huge pages

  • `hugepages_surplus` - Number of extra transparent huge pages

  • `hugepages_total` - Total number of huge pages

  • `kernel_stack` - Memory used by kernel stack allocations in kilo-Bytes

  • `page_tables` - Memory used for page management in kilo-Bytes

  • `s_reclaimable` - Cache memory that can be reallocated in case of memory shortage in kilo-Bytes

  • `Shmem` - Memory used by shared pages in kilo-Bytes

  • `slab` - Memory used by kernel data structures in kilo-Bytes

  • `swap_cached` - Memory used by the swap cache in kilo-Bytes

  • `swap_free` - Available memory in swap in kilo-Bytes

  • `swap_total` - Total swap memory in kilo-Bytes.

  • `Total` - Total memory in kilo-Bytes

  • `v_malloc_used` - Memory used by large memory areas allocated by the kernel
    Example :
    "meminfo": {
        "available": 13608896,
        "buffers": 380932,
        "cached": 1155824,
        "dirty": 28,
        "free": 13128080,
        "hugepages_anonymous": 423936,
        "hugepages_free": 0,
        "hugepages_reserved": 0,
        "hugepages_shmem": 0,
        "hugepages_surplus": 0,
        "hugepages_total": 0,
        "kernel_stack": 9152,
        "page_tables": 8400,
        "s_reclaimable": 43168,
        "shmem": 794564,
        "slab": 210008,
        "swap_cached": 0,
        "swap_free": 16777212,
        "swap_total": 16777212,
        "total": 15977468,
        "v_malloc_used": 66592
    },
    

N - Details of `numastat` counters - Non Uniform Memory Access (NUMA) node

  • `nodes` - List of NUMA nodes

    • `nodeX` - NUMA X node statistics

      • `interleave_hit` - Interleaved memory successfully allocated in this node

      • `local_node` - Memory allocated in this node while a process was running on it

      • `numa_foreign` - Memory planned for this node, but currently allocated in a different node

      • `numa_hit` - Memory successfully allocated in this node as expected

      • `numa_miss` - Memory allocated in this node despite process preferences.
        Each numa_miss has a numa_foreign in another node
      • `other_node` - Memory allocated in this node while a process was running in another node

Example :

"numastat": {
    "nodes": {
        "node0": {
            "interleave_hit": 3871,
            "local_node": 4410557829,
            "numa_foreign": 0,
            "numa_hit": 4410454203,
            "numa_miss": 0,
            "other_node": 14170
        },
        "node1": {
            "interleave_hit": 3869,
            "local_node": 4224990850,
            "numa_foreign": 0,
            "numa_hit": 4224964539,
            "numa_miss": 0,
            "other_node": 21531
        }
    }
},

O - Details of `quotas` counters - Quota statistics by category

  • `quotas` - Quota list

    • `by_gid` - Statistics sorted by group (gid identifier)

    • `by_prj` - Statistics sorted by project (prj identifier)

    • `by_uid` - Statistics sorted by user (uid identifier)

In each category, the following counters are displayed:

  • `block_grace` - Grace time for blocks

  • `block_hard_limit` - Hardware limit of blocks.
    Sets an absolute limit for the use of space.
    The user cannot exceed this limit.
    Beyond this limit, writing to this file system is forbidden.
  • `block_soft_limit` - Software block limit
    Specifies the maximum amount of space a user can occupy on the file system.
    If this limit is reached, the user receives warning messages that the quota assigned to them has been exceeded.
    If its use is combined with the timeframes (or grace period), when the user continues to exceed the software limit after the grace period has elapsed, then he finds himself in the same situation as in the reaching of a hard limit.
  • `block_used` - Number of blocks used

  • `file_grace` - Grace time for files

  • `file_hard_limit` - Hardware file limit
    Sets an absolute limit for the use of space.
    The user cannot exceed this limit.
    Beyond this limit, writing to this file system is forbidden.
  • `file_soft_limit` - Software file limit
    Specifies the maximum amount of space a user can occupy on the file system.
    If this limit is reached, the user receives warning messages that the quota assigned to them has been exceeded.
    If its use is combined with the timeframes (or grace period), when the user continues to exceed the software limit after the grace period has elapsed, then he finds himself in the same situation as in the reaching of a hard limit.
  • `file_used` - Number of files used

Example :

"quotas": {
     "by_gid": {
         "0": {
             "block_grace": "0",
             "block_hard_limit": "0",
             "block_soft_limit": "0",
             "block_used": "2148952",
             "file_grace": "0",
             "file_hard_limit": "0",
             "file_soft_limit": "0",
             "file_used": "177"
         },
         "10012": {
             "block_grace": "0",
             "block_hard_limit": "0",
             "block_soft_limit": "0",
             "block_used": "5216",
             "file_grace": "0",
             "file_hard_limit": "0",
             "file_soft_limit": "0",
             "file_used": "295"
         },
         }
     },
     "by_prj": {
         "0": {
             "block_grace": "0",
             "block_hard_limit": "0",
             "block_soft_limit": "0",
             "block_used": "51600",
             "file_grace": "0",
             "file_hard_limit": "0",
             "file_soft_limit": "0",
             "file_used": "225"
         },
         "1": {
             "block_grace": "0",
             "block_hard_limit": "7980499",
             "block_soft_limit": "7980499",
             "block_used": "2101904",
             "file_grace": "0",
             "file_hard_limit": "1000",
             "file_soft_limit": "1000",
             "file_used": "43"
         },
         }
     },
     "by_uid": {
         "0": {
             "block_grace": "0",
             "block_hard_limit": "0",
             "block_soft_limit": "0",
             "block_used": "2153356",
             "file_grace": "0",
             "file_hard_limit": "0",
             "file_soft_limit": "0",
             "file_used": "269"
         },
         "10012": {
             "block_grace": "0",
             "block_hard_limit": "0",
             "block_soft_limit": "0",
             "block_used": "1032",
             "file_grace": "0",
             "file_hard_limit": "0",
             "file_soft_limit": "0",
             "file_used": "258"
         },
     }
  }
Example below is without defined limit: the value "0" indicates that there is no defined value for limits and grace times.
"10012": {
     "block_grace": "0",
     "block_hard_limit": "0",
     "block_soft_limit": "0",
     "block_used": "1032",
     "file_grace": "0",
     "file_hard_limit": "0",
     "file_soft_limit": "0",
     "file_used": "258"
},

P - Details of `sofnet` counters – Statistics on received packets according to processor cores

  • `cpus` - Usage statistics per CPU

    • `CpuX` - CPU X core statistics

      • `backlog_len` -

      • `dropped` - Number of packets dropped

      • `flow_limit_count` - Number of times the throughput limit was reached

      • `processed` - Number of packets processed

      • `received_rps` - Number of times the CPU was woken up

      • `time_squeeze` - Number of times the thread could not process all the packets in its backlog within the allocated budget

    • `summed` - Overall core usage statistics

      • `backlog_len` -

      • `dropped` - Number of packets dropped

      • `flow_limit_count` - Number of times the throughput limit was reached

      • `processed` - Number of packets processed

      • `received_rps` - Number of times the CPU was woken up

      • `time_squeeze` - Number of times the thread could not process all the packets in its backlog within the allocated budget

Example :
"softnet": {
    "cpus": {
        "cpu0": {
            "backlog_len": 0,
            "dropped": 0,
            "flow_limit_count": 0,
            "processed": 448550,
            "received_rps": 0,
            "time_squeeze": 2
        },
        "cpu1": {
            "backlog_len": 0,
            "dropped": 0,
            "flow_limit_count": 0,
            "processed": 36250,
            "received_rps": 0,
            "time_squeeze": 0
        }
    },
    "summed": {
        "backlog_len": 0,
        "dropped": 0,
        "flow_limit_count": 0,
        "processed": 5239450,
        "received_rps": 0,
        "time_squeeze": 27
    }
},

Q - Details of `Sigflow` counters - Sigflow (monitoring-engine) information

`detailed_status` - Sigflow container status

  • `up` - Status of Sigflow and the detection engine

    detailed_status + status `up`

    signification

    status `Container down` + `up` false

    status engine off

    status `Container down` + `up` true

    impossible status: device cannot be rotated in a disabled container

    status `Container UP` + `up` false

    unstable status: call GATEWATCHER support

    status `Container UP` + `up` true

    status engine on

Example :

"suricata": {
    "detailed_status": "Container down",
    "up": false
},

R - Details of `systemd` counters - System initialization information

  • `failed_services` - List of failed services reported by `systemctl failed`.

Example :

"systemd": {
    "failed_services": [ "netdata.service" ]
},

S - Details of `uptime` counters - Uptime

  • `up_seconds` - Number of seconds since startup.

Example :

"uptime": {
    "up_seconds": 874179.8
},

T - Details of `virtualmemory` counters - Swap space information (swap)

  • `disk_in` - Number of pages saved to disk since startup.

  • `disk_out` - Number of pages out of disk since startup.

  • `pagefaults_major` - Number of page faults per second.

  • `pagefaults_minor` - Number of page faults per second to load a memory page from disk to RAM.

  • `swap_in` - Number of kilo-Bytes the system swapped from disk to RAM per second.

  • `swap_out` - Number of kilo-Bytes the system swapped from RAM to disk per second.
    Example :
    "virtualmemory": {
        "disk_in": 307828,
        "disk_out": 4724267,
        "pagefaults_major": 1210,
        "pagefaults_minor": 14233474300,
        "swap_in": 0,
        "swap_out": 0
    }