Kubernetes Cluster Hardware Recommendations

Overview

This document covers the minimal hardware recommendations for the Kublr Platform and a Kublr Kubernetes cluster. Once read, you can proceed with the deployment of the Kublr Platform and a Kubernetes cluster.

Kublr Kubernetes Cluster Requirements

Role Minimal required memory Minimal required CPU (cores) Components
Master node 2 GB 1.5 Kublr-Kubernetes master components (k8s-core, cert-updater, fluentd, kube-addon-manager, rescheduler, network, etcd, proxy, kubelet)
Worker node 700 mB 0.5 Kublr-Kubernetes worker components (fluentd, dns, proxy, network, kubelet)
Centralized monitoring agent * 2 GB 0.7 Prometheus. We recommend limit 2GB for typical installation of managed cluster which has 8 working, 40 pods per node with total 320 nodes. Retention period for prometheus agent is 1 hour.

Kublr Platform Feature Requirements

Feature Required memory Required CPU
Feature: Control Plane 1.9 GB 1.2
Feature: Centralized monitoring 5 GB 1.2
Feature: Centralized logging 11 GB 1.4
Feature: k8s core components 0.5 GB 0.15

Kublr Platform Deployment Example

Single master kubernetes cluster, at one-two worker nodes, use all Kublr’s features (two for basic reliability)

For a minimal Kublr Platform installation you should have one master node with 4GB memory and 2 CPU and worker node(s) with total 10GB + 1GB × (number of nodes) and 4.4 + 0.5 × (number of nodes) CPU cores.

Please note: We do not recommend using this configuration in production but this configuration is suitable to start exploring the Kublr Platform.

Provider Master Instance Type Worker Instance Type
Amazon Web Services t2.large/t3.large (2 vCPU, 4GB) 2 × t2(t3) xlarge (4 vCPU, 16GB)
Google Cloud Platform n1-standard-2 (2 vCPU, 7.5GB) 2 × n1-standard-4 (4 vCPU, 15GB)
Microsoft Azure A2 v2 (2 vCPU, 4GB) 2 × A8 v2 (8 vCPU, 16GB)
On-premises 2 vCPU, 5GB 2 × VM (3 vCPU, 16GB)

Workload Example

Master node: Kublr-Kubernetes master components (2 GB, 1.5 vCPU),

Worker node 1: Kublr-Kubernetes worker components (0.7 GB, 0.5 vCPU), Feature: ControlPlane (1.9GB, 1.2 vCPU), Feature: Centralized monitoring (5 GB, 1.2 vCPU) Feature: k8s core components (0.5 GB, 0.15 vCPU) Feature: Centralized logging (11GB, 1.4 vCPU)

Worker node 2: Kublr-Kubernetes worker components (0.7 GB, 0.5 vCPU), Feature: Centralized logging (11GB, 1.4 vCPU)

Self-Hosted Features

Kublr has several self-hosted features, which could be installed separated in Kublr-Kubernetes clusters.

Feature Required memory Required CPU
Self-hosted logging 9GB 1
Self-hosted monitoring 2.8GB 1.4

Calculating Needed Memory and CPU Availability for Business Applications

Note: By default Kublr disables scheduling business application on the master (you can change that), so we use only worker nodes in our formula.

Available memory = (number of nodes) × (memory per node) - (number of nodes) × 0.7GB - (has Self-hosted logging) × 9GB - (has Self-hosted monitoring) × 2.9GB - 0.4 GB - 2GB (Central monitoring agent per every cluster).

Available CPU = (number of nodes) × (vCPU per node) - (number of nodes) × 0.5 - (has Self-hosted logging) × 1 - (has Self-hosted monitoring) × 1.4 - 0.1 - 0.7 (Central monitoring agent per every cluster).

Example

User wants to create a Kublr-Kubernetes cluster with 5 n1-standard-4 nodes (in Google Cloud Platform) with enabled self-hosted logging, but disabled self-hosted monitoring, then:

  • Available memory = 5 × 15 - 5 × 0.7 - yes ×9 - no × 2.8 - 0.4 - 2= 60.1GB.
  • Available CPU = 5 × 4 - 5 × 0.5 - yes × 1 - no × 1.4 - 0.1 - 0.7= 15.7 vCPUs.

Note: You will use centralized monitoring available in the Kublr Platform instead of self-hosted monitoring

Total Required Disk calculation for Prometheus

To plan the disk capacity of a Prometheus server, you can use the rough formula:

RequiredDiskSpaceInBytes = RetentionPeriodInSeconds * IngestedSamplesPerSecond * BytesPerSample

RetentionPeriodInSeconds = 7 days by default (7 * 24 * 3600) BytesPerSample = 2 bytes in accordance with Prometheus documentation (http://prometheus.io/docs/prometheus/latest/storage/) IngestedSamples can be calculated as following:

IngestedSamples = IngestedSamplesPerKublrPlatform + Sum(IngestedSamplesPerKublrCluster)

IngestedSamplesPerKublrPlatform = (IngestedSamplesPerMasterNode * NumOfMasterNodes) + (IngestedSamplesPerWorkingNode * NumOfWorkingNodes) + IngestedSamplesPerControlPlane

IngestedSamplesPerKublrCluster = (IngestedSamplesPerMasterNode * NumOfMasterNodes) + (IngestedSamplesPerWorkingNode * NumOfWorkingNodes) + Sum(IngestedSamplesPerUserApplication)

IngestedSamplesPerMasterNode = 1000 samples can be used for regular Kublr Cluster Installation IngestedSamplesPerWorkingNode = 500 samples can be used for regular Kublr Cluster Installation IngestedSamplesPerControlPlane = 2500 samples can be used for regular Kublr ControlPlane deployment IngestedSamplesPerUserApplication = should be estimated by user

Total Required Disk calculation for Elasticsearch

To plan the disk capacity of a Elasticsearch, you can use the rough formula:

RequiredDiskSpaceInGBytes = 4*NumberOfElasticsearchMasterNodes + (0.7*NumberOfPlatformMasterNodes + 0.5*NumberOfPlatformWorkingNodes + 0.7*NumberOfAllClusterMasterNodes + 0.07*NumberOfAllClusterWorkingNodes + AllClustersDailyPayload) * (CuratorPeriod+1) * SafetyFactor

AllClustersDailyPayload = Ratio * SizeOfAllLogsGeneratedByNonKublrContainers

Recommended Ratio is 7 for average size of log records equals 132 bytes (we have established ratio = 9.5 for average size of log records equals 49 bytes)

Default CuratorPeriod = 2. It means Curator will delete indexes older than 2 days. To change please refer https://docs.kublr.com/logging/#5-change-parameters-to-collect-logs-for-more-than-2-days

For example, let’s calculate required space for platform (with 3 master nodes and 2 work nodes) and two cluster created by platform (each cluster 3 master node, 5 work nodes) each is deployed with some business application that generates 3.4Gb of logs each day. CuratorPeriod (period of logs cleaning) will be 14 days. Let’s use Safety Factor equals 1.3 (+30% of minimal calculated disk space to compensate for the errors of calculations)

AllClustersDailyPayload = 7 * (3.4*2) = 47.6 RequiredDiskSpaceInGBytes = 4*3 + ( 0.7*3 + 0.5*2 + 0.7*6 + 0.0710 + 47.6)(14+1) * 1.3 = 1096.2 To plan the disk capacity of a SelfHosted Elasticsearch, you can use the rough formula:

RequiredDiskSpaceInGBytes = 4*NumberOfElasticsearchMasterNodes + (0.5*NumberOfClusterMasterNodes +

        0.4*NumberOfClusterWorkingNodes + DailyPayload) * (CuratorPeriod+1) * SafetyFactor

Elasticsearch configuration recommendations

Default number of Master/Data/Client nodes is 1/1/1. It is highly recommended to use 3 or more master nodes in production.

Please research elasticsearch memory recommendations. Default heap size for data node is 3072m. To change it, please override elasticsearch.data.heapSize value during cluster creation as in example. It is possible to provide additional elasticsearch environment variables by setting elasticsearch.cluster.env values.

According to load tests, 100 pods (every second generates one record size 16kbytes) raises CPU consumption of elasticsearch data node to 0.4. In case of 100 pods generates 10-50 records by 132 bytes every seconds, CPU consumption of elasticsearch data node is 0.3