Default AlertManager rules in Kublr

Default AlertManager rules in Kublr

Alert NameDescription
authGetTokenFailFail get token for service error rate is heightened on instance over last 3 min
clusterCpuUsageHighCPU usage on instance is higher than 80% for 8 min
clusterMemoryUsageHighMemory usage on the cluster is higher than 90% for 5 min
clusterMemoryUsageLowMemory usage on the cluster is lower than 60% for 5 min
daemonSetMisscheduledNumberHighNumber of misscheduled pods is high in cluster for DaemonSet for 7 min
daemonSetReadyNumberLowNumber of ready pods is low in cluster for DaemonSet for 7 min
daemonSetScheduledNumberLowNumber of scheduled pods is low in cluster for DaemonSet for 7 min
deploymentAvailableNumberLowNumber of available replicas is low in cluster for Deployment for 7 min
deploymentReplicaNumberLowNumber of replicas is low in cluster for Deployment for 7 min
deploymentUnavailableNumberHighNumber of unavailable replicas is high in cluster for Deployment for 7 min
instanceCpuUsageHighCPU usage on instance is higher than 80% for 8 min
instanceDiskInodesFreeLowDevice on instance has few inodes left for over 10 min
instanceDiskSpaceFreeLowDevice on instance has less than 10% of free space for over 10 min
instanceDownFires if some instance is down for over 1 min in cluster
instanceMemoryUsageHighMemory usage on instance is higher than 95% for 5 min
instanceSwapUsageHighSwap memory exists on one or more instances. Swap usage is not recommended
k8sApiServerDownKubernetes API server is down for over 1 min in cluster
KubeApiServerAbsentNo kube-apiservers are available in cluster for 1 min
kubeletDockerOperationErrorsDocker operation error rate is heightened on instance over last 10 min
KubeMetricServerFailureKube-metric-server is unavailable in cluster for 1 min
KubePersistentVolumeFullInFourDaysThe persistent volume is expected to fill up within four days
nodeStatusConditionNode condition is not ready on for 7 min in cluster
nodeStatusNotReady1Node status is not ready for 7 min in cluster
podContainerRestartingPod container is restarting for 7 min
podContainerWaitingPod container is waiting for 7 min
podPhaseIncorrectPod is stuck in a wrong phase for 7 min
podStatusNotReadyPod is not ready for 7 min
podStatusNotScheduledPod is not scheduled for 7 min
promRuleEvalFailuresPrometheus failed to evaluate rule in cluster
replicaSetFullyLabeledNumberLowNumber of fully labeled replicas is low in cluster for ReplicaSet for 7 min
replicaSetReadyNumberLowNumber of ready replicas is low in clusterfor ReplicaSet for 7 min
replicaSetReplicaNumberLowNumber of replicas is low in cluster for ReplicaSet for 7 min
replicationControllerAvailableNumberLowNumber of available replicas is low in cluster for ReplicationController for 7 min
replicationControllerFullyLabeledNumberLowNumber of fully labeled replicas is low in cluster for ReplicationController for 7 min
replicationControllerReadyNumberLowNumber of ready replicas is low in cluster for ReplicationController for 7 min
replicationControllerReplicaNumberLowNumber of replicas is low in cluster for ReplicationController for 7 min

Customizing Alerts

Fired alerts may be found in Prometheus | Alerts menu

Fired alerts

or in Grafana | Alerts dashoard.

Fired alerts grafana

In order to send alert notifications to slack channel:

  • create webhook (Slack Setting | Add Web App | Incoming Webhooks | Add Incoming Webhooks Integration)
  • deploy/redeploy kublr-monitoring package with the following values:
alertmanager:
 config:
 default_receiver: slack
 receivers: |
 - name: slack
 slack_configs:
 - api_url: '<slack_api_url>'
 channel: '<channel_name>'

or deploy kublr platform by adding the above code to spec.features.monitoring section of cluster specification:

spec:
  features:
    monitoring:
      enabled: true
        platform:
          enabled: true
        grafana:
          enabled: true
          persistent: true
          size: 128G
        prometheus:
          persistent: true
          size: 128G
        alertmanager:
          enabled: true
      values:
        alertmanager:
          config:
            default_receiver: slack
            receivers: |
              - name: slack
                slack_configs:
                  - api_url: '<slack_api_url>'
                    channel: '<channel_name>'