Kubernetes Cluster Hardware Recommendations

Overview

This document covers the minimal hardware recommendations for the Kublr Platform and Kublr Kubernetes cluster. Once read, you can proceed with the deployment of the Kublr Platform and Kubernetes cluster.

Kublr Kubernetes Cluster Requirements

RoleMinimal required memoryMinimal required CPU (cores)Components
Master node2 GB1.5Kublr-Kubernetes master components (k8s-core, cert-updater, fluentd, kube-addon-manager, rescheduler, network, etcd, proxy, kubelet)
Worker node700 mB0.5Kublr-Kubernetes worker components (fluentd, dns, proxy, network, kubelet)
Centralized monitoring agent *2 GB0.7Prometheus. We recommend limit 2GB for typical installation of managed cluster which has 8 working, 40 pods per node with total 320 nodes. Retention period for Prometheus agent is 1 hour.

Kublr Platform Feature Requirements

FeatureRequired memoryRequired CPU
Feature: Control Plane1.9 GB1.2
Feature: Centralized monitoring5 GB1.2
Feature: Centralized logging11 GB1.4
Feature: k8s core components0.5 GB0.15

Kublr Platform Deployment Example

Single master Kubernetes cluster, at one-two worker nodes, use all Kublr’s features (two for basic reliability)

For a minimal Kublr Platform installation you should have one master node with 4GB memory and 2 CPU and worker node(s) with total 10GB + 1GB × (number of nodes) and 4.4 + 0.5 × (number of nodes) CPU cores.

Please note: We do not recommend using this configuration in production but this configuration is suitable to start exploring the Kublr Platform.

ProviderMaster Instance TypeWorker Instance Type
Amazon Web Servicest2.large/t3.large (2 vCPU, 4GB)2 × t2(t3) xlarge (4 vCPU, 16GB)
Google Cloud Platformn1-standard-2 (2 vCPU, 7.5GB)2 × n1-standard-4 (4 vCPU, 15GB)
Microsoft AzureA2 v2 (2 vCPU, 4GB)2 × A8 v2 (8 vCPU, 16GB)
On-premises2 vCPU, 5GB2 × VM (3 vCPU, 16GB)

Workload Example

Master node: Kublr-Kubernetes master components (2 GB, 1.5 vCPU),

Worker node 1: Kublr-Kubernetes worker components (0.7 GB, 0.5 vCPU), Feature: ControlPlane (1.9GB, 1.2 vCPU), Feature: Centralized monitoring (5 GB, 1.2 vCPU) Feature: k8s core components (0.5 GB, 0.15 vCPU) Feature: Centralized logging (11GB, 1.4 vCPU)

Worker node 2: Kublr-Kubernetes worker components (0.7 GB, 0.5 vCPU), Feature: Centralized logging (11GB, 1.4 vCPU)

Self-Hosted Features

Kublr has several self-hosted features, which could be installed separately in Kublr-Kubernetes clusters.

FeatureRequired memoryRequired CPU
Self-hosted logging9GB1
Self-hosted monitoring2.8GB1.4

Calculating Needed Memory and CPU Availability for Business Applications

Note: By default Kublr disables scheduling business application on the master, which can be modified. Thus, we use only worker nodes in our formula.

Available memory = (number of nodes) × (memory per node) - (number of nodes) × 0.7GB - (has Self-hosted logging) × 9GB - (has Self-hosted monitoring) × 2.9GB - 0.4 GB - 2GB (Central monitoring agent per every cluster).

Available CPU = (number of nodes) × (vCPU per node) - (number of nodes) × 0.5 - (has Self-hosted logging) × 1 - (has Self-hosted monitoring) × 1.4 - 0.1 - 0.7 (Central monitoring agent per every cluster).

Example

User wants to create a Kublr-Kubernetes cluster with 5 n1-standard-4 nodes (in Google Cloud Platform) with enabled Self-hosted logging, but disabled Self-hosted monitoring, then:

  • Available memory = 5 × 15 - 5 × 0.7 - yes ×9 - no × 2.8 - 0.4 - 2= 60.1GB.
  • Available CPU = 5 × 4 - 5 × 0.5 - yes × 1 - no × 1.4 - 0.1 - 0.7= 15.7 vCPUs.

Note: You will use centralized monitoring available in the Kublr Platform instead of Self-hosted monitoring

Total Required Disk calculation for Prometheus

To plan the disk capacity of a Prometheus server, you can use the rough formula:

RequiredDiskSpaceInBytes = RetentionPeriodInSeconds * IngestedSamplesPerSecond * BytesPerSample

RetentionPeriodInSeconds = 7 days by default (7 * 24 * 3600) BytesPerSample = 2 bytes in accordance with Prometheus documentation (http://prometheus.io/docs/prometheus/latest/storage/) IngestedSamples can be calculated as following:

IngestedSamples = IngestedSamplesPerKublrPlatform + Sum(IngestedSamplesPerKublrCluster)

IngestedSamplesPerKublrPlatform = (IngestedSamplesPerMasterNode * NumOfMasterNodes) + (IngestedSamplesPerWorkingNode * NumOfWorkingNodes) + IngestedSamplesPerControlPlane

IngestedSamplesPerKublrCluster = (IngestedSamplesPerMasterNode * NumOfMasterNodes) + (IngestedSamplesPerWorkingNode * NumOfWorkingNodes) + Sum(IngestedSamplesPerUserApplication)

IngestedSamplesPerMasterNode = 1000 samples can be used for regular Kublr Cluster Installation IngestedSamplesPerWorkingNode = 500 samples can be used for regular Kublr Cluster Installation IngestedSamplesPerControlPlane = 2500 samples can be used for regular Kublr ControlPlane deployment IngestedSamplesPerUserApplication = should be estimated by user

Total Required Disk calculation for Elasticsearch

To plan the disk capacity of Elasticsearch, you can use the rough formula:

RequiredDiskSpaceInGBytes = 4*NumberOfElasticsearchMasterNodes + (0.7*NumberOfPlatformMasterNodes + 0.5*NumberOfPlatformWorkingNodes + 0.7*NumberOfAllClusterMasterNodes + 0.07*NumberOfAllClusterWorkingNodes + AllClustersDailyPayload) * (CuratorPeriod+1) * SafetyFactor

AllClustersDailyPayload = Ratio * SizeOfAllLogsGeneratedByNonKublrContainers

Recommended Ratio is 7 for average size of log records equals 132 bytes (we have established ratio = 9.5 for average size of log records equals 49 bytes)

Default CuratorPeriod = 2. It means Curator will delete indexes older than 2 days. To change please refer https://docs.kublr.com/logging/#5-change-parameters-to-collect-logs-for-more-than-2-days

For example, let’s calculate required space for platform (with 3 master nodes and 2 work nodes) and two clusters created by platform (each cluster has 3 master node, 5 work nodes), each one deployed with some business application that generates 3.4Gb of logs every day. CuratorPeriod (period of logs cleaning) will be 14 days. Let’s use Safety Factor equals 1.3 (+30% of minimal calculated disk space to compensate for the errors of calculation)

AllClustersDailyPayload = 7 * (3.4*2) = 47.6 RequiredDiskSpaceInGBytes = 4*3 + ( 0.7*3 + 0.5*2 + 0.7*6 + 0.0710 + 47.6)(14+1) * 1.3 = 1096.2 To plan the disk capacity of a SelfHosted Elasticsearch, you can use the rough formula:

RequiredDiskSpaceInGBytes = 4*NumberOfElasticsearchMasterNodes + (0.5*NumberOfClusterMasterNodes + 0.4*NumberOfClusterWorkingNodes + DailyPayload) * (CuratorPeriod+1) * SafetyFactor

Elasticsearch configuration recommendations

Default number of Master/Data/Client nodes is 1/1/1. It is highly recommended to use 3 or more master nodes in production.

Please research Elasticsearch memory recommendations. Default heap size for data node is 3072m. To change it, please override elasticsearch.data.heapSize value during cluster creation as in example. It is possible to provide additional Elasticsearch environment variables by setting elasticsearch.cluster.env values.

According to load tests, 100 pods (one record, the size of 16kbytes, is generated every second) raise CPU consumption of Elasticsearch data node to 0.4. In case of 100 pods generating 10-50 records of 132 bytes every second, CPU consumption of Elasticsearch data node would be 0.3