mirror of
https://github.com/morten-olsen/homelab-operator.git
synced 2026-02-08 01:36:28 +01:00
216 lines
12 KiB
Markdown
216 lines
12 KiB
Markdown
Here's the guide formatted as a `README.md` file, ready for a GitHub repository or local documentation.
|
|
|
|
```markdown
|
|
# Optimizing Debian for K3s
|
|
|
|
This guide outlines steps to optimize a Debian server for running K3s (Lightweight Kubernetes). Optimization involves a combination of general Linux best practices, K3s-specific recommendations, and considerations for your specific workload.
|
|
|
|
## Table of Contents
|
|
|
|
- [1. Debian Base System Optimization](#1-debian-base-system-optimization)
|
|
- [a. Kernel Parameters (sysctl.conf)](#a-kernel-parameters-sysctlconf)
|
|
- [b. User Limits (ulimit)](#b-user-limits-ulimit)
|
|
- [c. Disable Unnecessary Services](#c-disable-unnecessary-services)
|
|
- [d. Update System](#d-update-system)
|
|
- [e. Swap Configuration](#e-swap-configuration)
|
|
- [2. K3s Specific Optimizations](#2-k3s-specific-optimizations)
|
|
- [a. Choose a Performant Storage Backend](#a-choose-a-performant-storage-backend)
|
|
- [b. Containerd Tuning](#b-containerd-tuning)
|
|
- [c. K3s Server and Agent Configuration](#c-k3s-server-and-agent-configuration)
|
|
- [d. CNI Choice](#d-cni-choice)
|
|
- [3. General Server Best Practices](#3-general-server-best-practices)
|
|
- [a. Fast Storage](#a-fast-storage)
|
|
- [b. Adequate RAM and CPU](#b-adequate-ram-and-cpu)
|
|
- [c. Network Configuration](#c-network-configuration)
|
|
- [d. Monitoring](#d-monitoring)
|
|
- [e. Logging](#e-logging)
|
|
- [4. Post-Optimization Verification](#4-post-optimization-verification)
|
|
|
|
---
|
|
|
|
## 1. Debian Base System Optimization
|
|
|
|
These steps are generally beneficial for any server, but particularly important for containerized environments like K3s.
|
|
|
|
### a. Kernel Parameters (sysctl.conf)
|
|
|
|
Edit `/etc/sysctl.conf` and apply changes with `sudo sysctl -p`.
|
|
|
|
```ini
|
|
# Increase maximum open files (for container processes, K3s components)
|
|
fs.inotify.max_user_watches = 524288 # For fs-based operations within containers
|
|
fs.inotify.max_user_instances = 8192 # For fs-based operations within containers
|
|
fs.file-max = 2097152 # Increase overall system file handle limit
|
|
|
|
# Increase limits for network connections
|
|
net.core.somaxconn = 65535 # Max backlog of pending connections
|
|
net.ipv4.tcp_tw_reuse = 1 # Allow reuse of TIME_WAIT sockets (caution: can sometimes mask issues)
|
|
net.ipv4.tcp_fin_timeout = 30 # Reduce TIME_WAIT duration
|
|
net.ipv4.tcp_max_syn_backlog = 65535 # Max number of remembered connection requests
|
|
net.ipv4.tcp_keepalive_time=600 # Shorter keepalive interval
|
|
net.ipv4.tcp_keepalive_intvl=60 # Keepalive interval
|
|
net.ipv4.tcp_keepalive_probes=3 # Keepalive probes
|
|
|
|
# Increase memory limits for network buffers (especially if high network traffic)
|
|
net.core.rmem_max = 26214400
|
|
net.core.wmem_max = 26214400
|
|
net.core.rmem_default = 26214400
|
|
net.core.wmem_default = 26214400
|
|
|
|
# Other useful parameters
|
|
vm.max_map_count = 262144 # Essential for Elasticsearch, MongoDB, etc.
|
|
vm.dirty_ratio = 5 # Reduce dirty page percentage for better write performance
|
|
vm.dirty_background_ratio = 10 # Reduce dirty page percentage for better write performance
|
|
kernel.pid_max = 4194304 # Increase max PIDs
|
|
```
|
|
|
|
**Explanation:**
|
|
- `fs.file-max`: K3s and its deployed containers can open a large number of files. Increasing this prevents "Too many open files" errors.
|
|
- `net.*`: These parameters help in handling a high number of concurrent network connections crucial for a Kubernetes cluster.
|
|
- `vm.max_map_count`: Required by some applications that run on Kubernetes (e.g., Elasticsearch).
|
|
|
|
### b. User Limits (ulimit)
|
|
|
|
Edit `/etc/security/limits.conf` (or create a file like `/etc/security/limits.d/k3s.conf`) for all users, or specifically for the user K3s runs as (often `root` by default or a dedicated `k3s` user).
|
|
|
|
```
|
|
# For all users (or a specific k3s user if you configure it)
|
|
* soft nofile 65536
|
|
* hard nofile 131072
|
|
* soft nproc 65536
|
|
* hard nproc 131072
|
|
```
|
|
**Note:** A reboot or logging out/in is often required for these changes to take effect for user sessions. Services typically pick up new limits upon restart.
|
|
|
|
**Explanation:**
|
|
- `nofile` (number of open files): Sets the per-user/per-process limit. K3s and pods need a high limit.
|
|
- `nproc` (number of processes): Each container consumes processes. A high limit prevents hitting a ceiling.
|
|
|
|
### c. Disable Unnecessary Services
|
|
|
|
Reducing background services frees up CPU, RAM, and I/O.
|
|
```bash
|
|
sudo systemctl disable --now apache2 # Example, replace with actual unused services
|
|
sudo systemctl disable --now nginx # Example
|
|
sudo systemctl disable --now cups # If not using printing
|
|
sudo systemctl disable --now modemmanager # If not using a modem
|
|
sudo systemctl disable --now bluetooth # If no bluetooth devices
|
|
# Review active services using:
|
|
# systemctl list-unit-files --type=service --state=enabled
|
|
```
|
|
|
|
### d. Update System
|
|
|
|
Keep your system packages up-to-date for security and performance bug fixes.
|
|
```bash
|
|
sudo apt update
|
|
sudo apt upgrade -y
|
|
sudo apt dist-upgrade -y # For major version changes (if applicable)
|
|
sudo apt autoremove -y
|
|
sudo reboot # After significant kernel or base system updates
|
|
```
|
|
|
|
### e. Swap Configuration
|
|
|
|
**It is generally recommended to disable swap on K3s nodes, especially worker nodes.** Swapping can severely degrade performance in containerized environments due to unpredictable latency.
|
|
|
|
If you absolutely must have swap (e.g., very low memory server, not recommended for production):
|
|
* Reduce swappiness: `sudo sysctl vm.swappiness=10` (or even `1`). Add `vm.swappiness = 10` to `/etc/sysctl.conf`.
|
|
* Preferably, disable swap entirely if you have sufficient RAM:
|
|
```bash
|
|
sudo swapoff -a
|
|
sudo sed -i '/ swap / s/^/#/' /etc/fstab
|
|
```
|
|
**WARNING:** Only disable swap if your system has sufficient RAM to handle its workload without it. If nodes run out of memory without swap, processes will be OOM-killed, leading to instability.
|
|
|
|
## 2. K3s Specific Optimizations
|
|
|
|
### a. Choose a Performant Storage Backend
|
|
|
|
The choice of K3s's data store significantly impacts performance and availability.
|
|
|
|
* **SQLite (Default):** Good for single-node setups or small, non-critical clusters. Performance can degrade with many changes or large clusters.
|
|
* **External Database (MariaDB/MySQL, PostgreSQL):**
|
|
* **Recommended for Production:** Offers high availability and better performance than embedded SQLite for multi-node K3s server configurations.
|
|
* **Placement:** Place the external database on a separate server or on a dedicated, fast storage volume.
|
|
* **External etcd:** Offers the best performance and scalability, but is more complex to manage and requires its own dedicated etcd cluster.
|
|
|
|
### b. Containerd Tuning
|
|
|
|
K3s uses containerd as its container runtime.
|
|
|
|
* **Fast Storage for Containerd:** Ensure the directories where containerd stores its data are on fast storage (NVMe SSDs are ideal).
|
|
* `/var/lib/rancher/k3s/agent/containerd/io.containerd.snapshotter.v1.overlayfs` (K3s specific)
|
|
* (`/var/lib/containerd` if using a standalone containerd setup)
|
|
This is critical for image pulls, container startup, and overlayfs performance.
|
|
|
|
### c. K3s Server and Agent Configuration
|
|
|
|
Configure K3s using a configuration file (e.g., `/etc/rancher/k3s/config.yaml`) or command-line flags.
|
|
|
|
* **Disable Unused Components:** Reduce resource consumption by disabling features you don't need.
|
|
* `--disable traefik`: If using Nginx Ingress Controller or another ingress.
|
|
* `--disable servicelb`: If using a cloud provider Load Balancer, MetalLB, or another solution.
|
|
* `--disable local-storage`: If using cloud provider storage, NFS, or another remote storage solution.
|
|
* `--disable metrics-server`: If using a different metrics solution or don't need it.
|
|
* `--disable helm-controller`: If exclusively using `kubectl` for deployments.
|
|
|
|
**Example `/etc/rancher/k3s/config.yaml` for a server node:**
|
|
```yaml
|
|
# /etc/rancher/k3s/config.yaml
|
|
server: true
|
|
disable:
|
|
- traefik
|
|
- servicelb
|
|
- local-storage
|
|
- metrics-server
|
|
# Example for external database
|
|
# datastore-endpoint: "mysql://k3s:password@tcp(db-server:3306)/kube?parseTime=true"
|
|
```
|
|
|
|
### d. CNI Choice
|
|
|
|
K3s defaults to Flannel (with VXLAN), which is performant for many use cases.
|
|
* **Alternative CNIs (Calico, Cilium):** If you require advanced network policies, superior performance in high-throughput scenarios, or specific networking features, consider replacing Flannel. These can offer better raw throughput or lower latency but add complexity.
|
|
* If installing K3s, you'd typically skip Flannel installation (`--flannel-backend=none`) then install your chosen CNI.
|
|
* Ensure your chosen CNI is optimized with the correct kernel modules and sysctls.
|
|
|
|
## 3. General Server Best Practices
|
|
|
|
### a. Fast Storage
|
|
|
|
* **SSD/NVMe:** Absolutely crucial for K3s performance, especially for the K3s data directory (`$K3S_DATA_DIR`, default: `/var/lib/rancher/k3s`), `/var/lib/containerd`, and the operating system itself. Pod startup times, image pulls, and database operations are heavily I/O bound.
|
|
* **RAID:** If using multiple drives, consider RAID1 or RAID10 for redundancy and increased I/O performance.
|
|
|
|
### b. Adequate RAM and CPU
|
|
|
|
* **RAM:** K3s servers (especially with embedded SQLite) require more RAM. Worker nodes also need ample RAM for their pods. Err on the side of more RAM.
|
|
* **CPU:** Ensure sufficient CPU cores for K3s components, containers, and your workloads.
|
|
|
|
### c. Network Configuration
|
|
|
|
* **Gigabit Ethernet (at least):** 10Gbps or faster is ideal for larger clusters or high-bandwidth applications.
|
|
* **MTU:** Ensure consistent MTU settings across all nodes and your network infrastructure. K3s default CNI (Flannel VXLAN) might use a smaller MTU (e.g., 1450) due to encapsulation overhead. Misconfigured MTU can lead to packet fragmentation and performance issues.
|
|
* **Jumbo Frames:** If your network supports it and all components are configured for it, jumbo frames (e.g., 9000 bytes MTU) can reduce overhead and improve throughput, but requires careful and consistent configuration.
|
|
|
|
### d. Monitoring
|
|
|
|
* **Prometheus/Grafana:** Essential for monitoring resource usage (CPU, RAM, disk I/O, network) of your nodes and K3s components. This helps identify and diagnose bottlenecks.
|
|
* **Kube-state-metrics:** Provides metrics about Kubernetes objects.
|
|
* **Node Exporter:** Provides system-level metrics.
|
|
* **cAdvisor (usually bundled with container runtimes):** Provides container-level metrics.
|
|
|
|
### e. Logging
|
|
|
|
* **Centralized Logging (ELK Stack, Loki, etc.):** Stream logs from K3s components and pods to a central logging system for easier debugging, troubleshooting, and performance analysis.
|
|
|
|
## 4. Post-Optimization Verification
|
|
|
|
1. **Reboot:** After making changes to kernel parameters or `limits.conf`, a full system reboot is often the safest way to ensure all changes are fully applied.
|
|
2. **Verify sysctl settings:** `sudo sysctl -a | grep -i <parameter_name>` (e.g., `sudo sysctl -a | grep -i fs.file-max`)
|
|
3. **Verify ulimits:** Check `ulimit -n` and `ulimit -u` in a new shell. For specific running processes, inspect `/proc/<pid>/limits`.
|
|
4. **Monitor Performance:** Use tools like `htop`, `iostat`, `netstat`, `dstat`, and your installed monitoring stack (Prometheus/Grafana) to observe the impact of your changes. Look for reduced CPU usage, lower I/O wait, improved network throughput, and stable memory usage.
|
|
5. **Test Workloads:** Deploy your actual applications and perform load testing to ensure the optimizations yield the desired performance benefits under realistic conditions.
|
|
|
|
By diligently following these steps, you can establish a robust and highly performant Debian environment for your K3s cluster. Always test changes in a staging or development environment before applying them to production systems.
|
|
``` |