Kubernetes Cluster Hardware Recommendations

Overview

This document covers the minimal hardware recommendations for the Kublr Platform and Kublr Kubernetes cluster. Once read, you can proceed with the deployment of the Kublr Platform and Kubernetes cluster.

Kublr Kubernetes Cluster Requirements

Role	Minimal required memory	Minimal required CPU (cores)	Components
Master node	2 GB	1.5	Kublr-Kubernetes master components (k8s-core, cert-updater, fluentd, kube-addon-manager, rescheduler, network, etcd, proxy, kubelet)
Worker node	700 mB	0.5	Kublr-Kubernetes worker components (fluentd, dns, proxy, network, kubelet)
Centralized monitoring agent *	2 GB	0.7	Prometheus. We recommend limit 2GB for typical installation of managed cluster which has 8 working, 40 pods per node with total 320 nodes. Retention period for Prometheus agent is 1 hour.

Kublr Platform Feature Requirements

Feature	Required memory	Required CPU
Feature: Control Plane	1.9 GB	1.2
Feature: Centralized monitoring	5 GB	1.2
Feature: Centralized logging	11 GB	1.4
Feature: k8s core components	0.5 GB	0.15

Kublr Platform Deployment Example

Single master Kubernetes cluster, at one-two worker nodes, use all Kublr’s features (two for basic reliability)

For a minimal Kublr Platform installation you should have one master node with 4GB memory and 2 CPU and worker node(s) with total 10GB + 1GB × (number of nodes) and 4.4 + 0.5 × (number of nodes) CPU cores.

Please note: We do not recommend using this configuration in production but this configuration is suitable to start exploring the Kublr Platform.

Provider	Master Instance Type	Worker Instance Type
Amazon Web Services	t2.large/t3.large (2 vCPU, 4GB)	2 × t2(t3) xlarge (4 vCPU, 16GB)
Google Cloud Platform	n1-standard-2 (2 vCPU, 7.5GB)	2 × n1-standard-4 (4 vCPU, 15GB)
Microsoft Azure	A2 v2 (2 vCPU, 4GB)	2 × A8 v2 (8 vCPU, 16GB)
On-Premises	2 vCPU, 5GB	2 × VM (3 vCPU, 16GB)

Workload Example

Master node: Kublr-Kubernetes master components (2 GB, 1.5 vCPU),

Worker node 1: Kublr-Kubernetes worker components (0.7 GB, 0.5 vCPU), Feature: ControlPlane (1.9GB, 1.2 vCPU), Feature: Centralized monitoring (5 GB, 1.2 vCPU) Feature: k8s core components (0.5 GB, 0.15 vCPU) Feature: Centralized logging (11GB, 1.4 vCPU)

Worker node 2: Kublr-Kubernetes worker components (0.7 GB, 0.5 vCPU), Feature: Centralized logging (11GB, 1.4 vCPU)

Self-Hosted Features

Kublr has several self-hosted features, which could be installed separately in Kublr-Kubernetes clusters.

Feature	Required memory	Required CPU
Self-hosted logging	9GB	1
Self-hosted monitoring	2.8GB	1.4

Calculating Needed Memory and CPU Availability for Business Applications

Note: By default Kublr disables scheduling business application on the master, which can be modified. Thus, we use only worker nodes in our formula.

Available memory = (number of nodes) × (memory per node) - (number of nodes) × 0.7GB - (has Self-hosted logging) × 9GB - (has Self-hosted monitoring) × 2.9GB - 0.4 GB - 2GB (Central monitoring agent per every cluster).

Available CPU = (number of nodes) × (vCPU per node) - (number of nodes) × 0.5 - (has Self-hosted logging) × 1 - (has Self-hosted monitoring) × 1.4 - 0.1 - 0.7 (Central monitoring agent per every cluster).

Example

User wants to create a Kublr-Kubernetes cluster with 5 n1-standard-4 nodes (in Google Cloud Platform) with enabled Self-hosted logging, but disabled Self-hosted monitoring, then:

Available memory = 5 × 15 - 5 × 0.7 - yes ×9 - no × 2.8 - 0.4 - 2= 60.1GB.
Available CPU = 5 × 4 - 5 × 0.5 - yes × 1 - no × 1.4 - 0.1 - 0.7= 15.7 vCPUs.

Note: You will use centralized monitoring available in the Kublr Platform instead of Self-hosted monitoring

Total Required Disk calculation for Prometheus

To plan the disk capacity of a Prometheus server, you can use the rough formula:

RequiredDiskSpaceInBytes = RetentionPeriodInSeconds * IngestedSamplesPerSecond * BytesPerSample

RetentionPeriodInSeconds = 7 days by default (7 * 24 * 3600) BytesPerSample = 2 bytes in accordance with Prometheus documentation (http://prometheus.io/docs/prometheus/latest/storage/) IngestedSamples can be calculated as following:

IngestedSamples = IngestedSamplesPerKublrPlatform + Sum(IngestedSamplesPerKublrCluster)

IngestedSamplesPerKublrPlatform = (IngestedSamplesPerMasterNode * NumOfMasterNodes) + (IngestedSamplesPerWorkingNode * NumOfWorkingNodes) + IngestedSamplesPerControlPlane

IngestedSamplesPerKublrCluster = (IngestedSamplesPerMasterNode * NumOfMasterNodes) + (IngestedSamplesPerWorkingNode * NumOfWorkingNodes) + Sum(IngestedSamplesPerUserApplication)

IngestedSamplesPerMasterNode = 1000 samples can be used for regular Kublr Cluster Installation IngestedSamplesPerWorkingNode = 500 samples can be used for regular Kublr Cluster Installation IngestedSamplesPerControlPlane = 2500 samples can be used for regular Kublr ControlPlane deployment IngestedSamplesPerUserApplication = should be estimated by user

Total Required Disk calculation for Elasticsearch

To plan the disk capacity of Elasticsearch, you can use the rough formula:

RequiredDiskSpaceInGBytes = 4NumberOfElasticsearchMasterNodes + (0.7NumberOfPlatformMasterNodes + 0.5NumberOfPlatformWorkingNodes + 0.7NumberOfAllClusterMasterNodes + 0.07*NumberOfAllClusterWorkingNodes + AllClustersDailyPayload) * (CuratorPeriod+1) * SafetyFactor

AllClustersDailyPayload = Ratio * SizeOfAllLogsGeneratedByNonKublrContainers

Recommended Ratio is 7 for average size of log records equals 132 bytes (we have established ratio = 9.5 for average size of log records equals 49 bytes)

Default CuratorPeriod = 2. It means Curator will delete indexes older than 2 days. To change please refer https://docs.kublr.com/logging/#5-change-parameters-to-collect-logs-for-more-than-2-days

For example, let’s calculate required space for platform (with 3 master nodes and 2 worker nodes) and two clusters created by platform (each cluster has 3 master node, 5 worker nodes), each one deployed with some business application that generates 3.4Gb of logs every day. CuratorPeriod (period of logs cleaning) will be 14 days. Let’s use Safety Factor equals 1.3 (+30% of minimal calculated disk space to compensate for the errors of calculation)

AllClustersDailyPayload = 7 * (3.42) = 47.6 RequiredDiskSpaceInGBytes = 43 + ( 0.73 + 0.52 + 0.76 + 0.0710 + 47.6)*(14+1) * 1.3 = 1096.2 To plan the disk capacity of a SelfHosted Elasticsearch, you can use the rough formula:

RequiredDiskSpaceInGBytes = 4NumberOfElasticsearchMasterNodes + (0.5NumberOfClusterMasterNodes + 0.4*NumberOfClusterWorkingNodes + DailyPayload) * (CuratorPeriod+1) * SafetyFactor

Elasticsearch configuration recommendations

Default number of Master/Data/Client nodes is 1/1/1. It is highly recommended to use 3 or more master nodes in production.

Please research Elasticsearch memory recommendations. Default heap size for data node is 3072m. To change it, please override elasticsearch.data.heapSize value during cluster creation as in example. It is possible to provide additional Elasticsearch environment variables by setting elasticsearch.cluster.env values.

According to load tests, 100 pods (one record, the size of 16kbytes, is generated every second) raise CPU consumption of Elasticsearch data node to 0.4. In case of 100 pods generating 10-50 records of 132 bytes every second, CPU consumption of Elasticsearch data node would be 0.3