Default AlertManager rules in Kublr

Default AlertManager rules in Kublr

Alert Name Description
authGetTokenFail Fail get token for service error rate is heightened on instance over last 3 min
clusterCpuUsageHigh CPU usage on instance is higher than 80% for 8 min
clusterMemoryUsageHigh Memory usage on the cluster is higher than 90% for 5 min
clusterMemoryUsageLow Memory usage on the cluster is lower than 60% for 5 min
daemonSetMisscheduledNumberHigh Number of misscheduled pods is high in cluster for DaemonSet for 7 min
daemonSetReadyNumberLow Number of ready pods is low in cluster for DaemonSet for 7 min
daemonSetScheduledNumberLow Number of scheduled pods is low in cluster for DaemonSet for 7 min
deploymentAvailableNumberLow Number of available replicas is low in cluster for Deployment for 7 min
deploymentReplicaNumberLow Number of replicas is low in cluster for Deployment for 7 min
deploymentUnavailableNumberHigh Number of unavailable replicas is high in cluster for Deployment for 7 min
instanceCpuUsageHigh CPU usage on instance is higher than 80% for 8 min
instanceDiskInodesFreeLow Device on instance has few inodes left for over 10 min
instanceDiskSpaceFreeLow Device on instance has less than 10% of free space for over 10 min
instanceDown Fires if some instance is down for over 1 min in cluster
instanceMemoryUsageHigh Memory usage on instance is higher than 95% for 5 min
instanceSwapUsageHigh Swap memory exists on one or more instances. Swap usage is not recommended
k8sApiServerDown Kubernetes API server is down for over 1 min in cluster
KubeApiServerAbsent No kube-apiservers are available in cluster for 1 min
kubeletDockerOperationErrors Docker operation error rate is heightened on instance over last 10 min
KubeMetricServerFailure Kube-metric-server is unavailable in cluster for 1 min
KubePersistentVolumeFullInFourDays The persistent volume is expected to fill up within four days
nodeStatusCondition Node condition is not ready on for 7 min in cluster
nodeStatusNotReady1 Node status is not ready for 7 min in cluster
podContainerRestarting Pod container is restarting for 7 min
podContainerWaiting Pod container is waiting for 7 min
podPhaseIncorrect Pod is stuck in a wrong phase for 7 min
podStatusNotReady Pod is not ready for 7 min
podStatusNotScheduled Pod is not scheduled for 7 min
promRuleEvalFailures Prometheus failed to evaluate rule in cluster
replicaSetFullyLabeledNumberLow Number of fully labeled replicas is low in cluster for ReplicaSet for 7 min
replicaSetReadyNumberLow Number of ready replicas is low in clusterfor ReplicaSet for 7 min
replicaSetReplicaNumberLow Number of replicas is low in cluster for ReplicaSet for 7 min
replicationControllerAvailableNumberLow Number of available replicas is low in cluster for ReplicationController for 7 min
replicationControllerFullyLabeledNumberLow Number of fully labeled replicas is low in cluster for ReplicationController for 7 min
replicationControllerReadyNumberLow Number of ready replicas is low in cluster for ReplicationController for 7 min
replicationControllerReplicaNumberLow Number of replicas is low in cluster for ReplicationController for 7 min

Customizing Alerts

Fired alerts may be found in Prometheus | Alerts menu

Fired alerts

or in Grafana | Alerts dashoard.

Fired alerts grafana

In order to send alert notifications to slack channel:

  • create webhook (Slack Setting | Add Web App | Incoming Webhooks | Add Incoming Webhooks Integration)
  • deploy/redeploy kublr-monitoring package with the following values:
alertmanager:
 config:
 default_receiver: slack
 receivers: |
 - name: slack
 slack_configs:
 - api_url: '<slack_api_url>'
 channel: '<channel_name>'

or deploy kublr platform by adding the above code to spec.features.monitoring section of cluster specification:

spec:
  features:
    monitoring:
      enabled: true
        platform:
          enabled: true
        grafana:
          enabled: true
          persistent: true
          size: 128G
        prometheus:
          persistent: true
          size: 128G
        alertmanager:
          enabled: true
      values:
        alertmanager:
          config:
            default_receiver: slack
            receivers: |
              - name: slack
                slack_configs:
                  - api_url: '<slack_api_url>'
                    channel: '<channel_name>'

Questions? Suggestions? Need help? Contact us.