Kubernetes Container Orchestration: A Comprehensive Guide to Cloud-Native Deployment
Introduction
Kubernetes has revolutionized the way we deploy, manage, and scale containerized applications in cloud-native environments. As the de facto standard for container orchestration, Kubernetes provides a robust platform for automating deployment, scaling, and operations of application containers across clusters of hosts.
This comprehensive guide explores Kubernetes fundamentals, advanced features, and best practices for building resilient, scalable cloud-native applications.
1. Kubernetes Architecture Overview
1.1 Control Plane Components
The Kubernetes control plane manages the overall state of the cluster and makes global decisions about the cluster.
# Example: Control Plane Configuration
apiVersion: kubeadm.k8s.io/v1beta3
kind: ClusterConfiguration
metadata:
name: cluster-config
kubernetesVersion: v1.28.0
controlPlaneEndpoint: "k8s-api.example.com:6443"
networking:
serviceSubnet: "10.96.0.0/12"
podSubnet: "10.244.0.0/16"
dnsDomain: "cluster.local"
etcd:
local:
dataDir: "/var/lib/etcd"
apiServer:
advertiseAddress: "192.168.1.100"
bindPort: 6443
extraArgs:
enable-admission-plugins: "NodeRestriction,ResourceQuota"
audit-log-maxage: "30"
audit-log-maxbackup: "3"
audit-log-maxsize: "100"
audit-log-path: "/var/log/audit.log"
controllerManager:
extraArgs:
bind-address: "0.0.0.0"
cluster-signing-cert-file: "/etc/kubernetes/pki/ca.crt"
cluster-signing-key-file: "/etc/kubernetes/pki/ca.key"
scheduler:
extraArgs:
bind-address: "0.0.0.0"
1.2 Node Components
Worker nodes run the containerized applications and are managed by the control plane.
# Example: Node Configuration
apiVersion: v1
kind: Node
metadata:
name: worker-node-1
labels:
kubernetes.io/hostname: worker-node-1
node-role.kubernetes.io/worker: ""
zone: us-west-1a
instance-type: m5.large
spec:
podCIDR: "10.244.1.0/24"
providerID: "aws:///us-west-1a/i-1234567890abcdef0"
status:
capacity:
cpu: "2"
memory: "8Gi"
pods: "110"
ephemeral-storage: "20Gi"
allocatable:
cpu: "1900m"
memory: "7.5Gi"
pods: "110"
ephemeral-storage: "18Gi"
conditions:
- type: Ready
status: "True"
lastHeartbeatTime: "2024-03-15T10:00:00Z"
lastTransitionTime: "2024-03-15T09:00:00Z"
reason: KubeletReady
message: kubelet is posting ready status
nodeInfo:
machineID: "ec2-instance-id"
systemUUID: "EC2-12345678-1234-1234-1234-123456789012"
bootID: "12345678-1234-1234-1234-123456789012"
kernelVersion: "5.4.0-1043-aws"
osImage: "Ubuntu 20.04.3 LTS"
containerRuntimeVersion: "containerd://1.6.6"
kubeletVersion: "v1.28.0"
kubeProxyVersion: "v1.28.0"
operatingSystem: "linux"
architecture: "amd64"
2. Core Kubernetes Resources
2.1 Pods and Containers
Pods are the smallest deployable units in Kubernetes, containing one or more containers.
# Example: Multi-container Pod with sidecar pattern
apiVersion: v1
kind: Pod
metadata:
name: web-app-with-sidecar
namespace: production
labels:
app: web-app
version: v1.2.0
tier: frontend
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "9090"
prometheus.io/path: "/metrics"
spec:
restartPolicy: Always
serviceAccountName: web-app-sa
securityContext:
runAsNonRoot: true
runAsUser: 1000
fsGroup: 2000
containers:
# Main application container
- name: web-app
image: myregistry.com/web-app:v1.2.0
imagePullPolicy: IfNotPresent
ports:
- containerPort: 8080
name: http
protocol: TCP
env:
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: db-credentials
key: url
- name: REDIS_HOST
valueFrom:
configMapKeyRef:
name: app-config
key: redis.host
- name: LOG_LEVEL
value: "INFO"
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "512Mi"
cpu: "500m"
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
timeoutSeconds: 3
failureThreshold: 3
volumeMounts:
- name: app-config
mountPath: /etc/config
readOnly: true
- name: logs
mountPath: /var/log/app
- name: tmp
mountPath: /tmp
# Logging sidecar container
- name: log-shipper
image: fluent/fluent-bit:2.1.0
env:
- name: FLUENT_CONF
value: fluent-bit.conf
- name: FLUENT_OPT
value: ""
resources:
requests:
memory: "64Mi"
cpu: "50m"
limits:
memory: "128Mi"
cpu: "100m"
volumeMounts:
- name: logs
mountPath: /var/log/app
readOnly: true
- name: fluent-bit-config
mountPath: /fluent-bit/etc
readOnly: true
# Metrics exporter sidecar
- name: metrics-exporter
image: prom/node-exporter:v1.6.0
ports:
- containerPort: 9090
name: metrics
args:
- --path.procfs=/host/proc
- --path.sysfs=/host/sys
- --collector.filesystem.ignored-mount-points
- "^/(sys|proc|dev|host|etc|rootfs/var/lib/docker/containers|rootfs/var/lib/docker/overlay2|rootfs/run/docker/netns|rootfs/var/lib/docker/aufs)($$|/)"
resources:
requests:
memory: "32Mi"
cpu: "25m"
limits:
memory: "64Mi"
cpu: "50m"
volumeMounts:
- name: proc
mountPath: /host/proc
readOnly: true
- name: sys
mountPath: /host/sys
readOnly: true
volumes:
- name: app-config
configMap:
name: app-config
- name: logs
emptyDir: {}
- name: tmp
emptyDir: {}
- name: fluent-bit-config
configMap:
name: fluent-bit-config
- name: proc
hostPath:
path: /proc
- name: sys
hostPath:
path: /sys
nodeSelector:
zone: us-west-1a
tolerations:
- key: "node.kubernetes.io/not-ready"
operator: "Exists"
effect: "NoExecute"
tolerationSeconds: 300
- key: "node.kubernetes.io/unreachable"
operator: "Exists"
effect: "NoExecute"
tolerationSeconds: 300
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- web-app
topologyKey: kubernetes.io/hostname
2.2 Deployments and ReplicaSets
Deployments provide declarative updates for Pods and ReplicaSets.
# Example: Production-ready Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: web-app-deployment
namespace: production
labels:
app: web-app
version: v1.2.0
annotations:
deployment.kubernetes.io/revision: "3"
kubernetes.io/change-cause: "Update to version v1.2.0 with security patches"
spec:
replicas: 5
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1
maxSurge: 2
selector:
matchLabels:
app: web-app
template:
metadata:
labels:
app: web-app
version: v1.2.0
tier: frontend
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "8080"
prometheus.io/path: "/metrics"
spec:
serviceAccountName: web-app-sa
securityContext:
runAsNonRoot: true
runAsUser: 1000
fsGroup: 2000
containers:
- name: web-app
image: myregistry.com/web-app:v1.2.0
imagePullPolicy: IfNotPresent
ports:
- containerPort: 8080
name: http
env:
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: db-credentials
key: url
- name: REDIS_HOST
valueFrom:
configMapKeyRef:
name: app-config
key: redis.host
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "512Mi"
cpu: "500m"
livenessProbe:
httpGet:
path: /health
port: 8080
httpHeaders:
- name: Custom-Header
value: liveness-probe
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
successThreshold: 1
readinessProbe:
httpGet:
path: /ready
port: 8080
httpHeaders:
- name: Custom-Header
value: readiness-probe
initialDelaySeconds: 5
periodSeconds: 5
timeoutSeconds: 3
failureThreshold: 3
successThreshold: 1
startupProbe:
httpGet:
path: /startup
port: 8080
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 30
successThreshold: 1
lifecycle:
preStop:
exec:
command:
- /bin/sh
- -c
- "sleep 15"
volumeMounts:
- name: app-config
mountPath: /etc/config
readOnly: true
- name: cache
mountPath: /var/cache/app
- name: tmp
mountPath: /tmp
volumes:
- name: app-config
configMap:
name: app-config
defaultMode: 0644
- name: cache
emptyDir:
sizeLimit: 1Gi
- name: tmp
emptyDir:
sizeLimit: 500Mi
nodeSelector:
node-type: application
tolerations:
- key: "app-tier"
operator: "Equal"
value: "frontend"
effect: "NoSchedule"
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- web-app
topologyKey: kubernetes.io/hostname
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
preference:
matchExpressions:
- key: zone
operator: In
values:
- us-west-1a
- us-west-1b
terminationGracePeriodSeconds: 30
dnsPolicy: ClusterFirst
restartPolicy: Always
2.3 Services and Networking
Services provide stable network endpoints for accessing Pods.
# Example: Comprehensive Service Configuration
apiVersion: v1
kind: Service
metadata:
name: web-app-service
namespace: production
labels:
app: web-app
service-type: frontend
annotations:
service.beta.kubernetes.io/aws-load-balancer-type: "nlb"
service.beta.kubernetes.io/aws-load-balancer-cross-zone-load-balancing-enabled: "true"
service.beta.kubernetes.io/aws-load-balancer-backend-protocol: "http"
service.beta.kubernetes.io/aws-load-balancer-healthcheck-path: "/health"
service.beta.kubernetes.io/aws-load-balancer-healthcheck-interval: "10"
service.beta.kubernetes.io/aws-load-balancer-healthcheck-timeout: "5"
service.beta.kubernetes.io/aws-load-balancer-healthy-threshold: "2"
service.beta.kubernetes.io/aws-load-balancer-unhealthy-threshold: "3"
spec:
type: LoadBalancer
selector:
app: web-app
ports:
- name: http
port: 80
targetPort: 8080
protocol: TCP
- name: https
port: 443
targetPort: 8080
protocol: TCP
sessionAffinity: ClientIP
sessionAffinityConfig:
clientIP:
timeoutSeconds: 10800
loadBalancerSourceRanges:
- "10.0.0.0/8"
- "172.16.0.0/12"
- "192.168.0.0/16"
---
# Internal service for inter-service communication
apiVersion: v1
kind: Service
metadata:
name: web-app-internal
namespace: production
labels:
app: web-app
service-type: internal
spec:
type: ClusterIP
selector:
app: web-app
ports:
- name: http
port: 8080
targetPort: 8080
protocol: TCP
- name: metrics
port: 9090
targetPort: 9090
protocol: TCP
---
# Headless service for StatefulSet
apiVersion: v1
kind: Service
metadata:
name: web-app-headless
namespace: production
labels:
app: web-app
service-type: headless
spec:
type: ClusterIP
clusterIP: None
selector:
app: web-app
ports:
- name: http
port: 8080
targetPort: 8080
protocol: TCP
publishNotReadyAddresses: true
3. Advanced Kubernetes Features
3.1 StatefulSets for Stateful Applications
StatefulSets manage stateful applications with stable network identities and persistent storage.
# Example: Database StatefulSet
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: postgresql-cluster
namespace: database
labels:
app: postgresql
cluster: primary
spec:
serviceName: postgresql-headless
replicas: 3
updateStrategy:
type: RollingUpdate
rollingUpdate:
partition: 0
selector:
matchLabels:
app: postgresql
cluster: primary
template:
metadata:
labels:
app: postgresql
cluster: primary
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "9187"
prometheus.io/path: "/metrics"
spec:
serviceAccountName: postgresql-sa
securityContext:
runAsUser: 999
runAsGroup: 999
fsGroup: 999
initContainers:
- name: init-postgresql
image: postgres:15-alpine
command:
- /bin/bash
- -c
- |
set -e
# Initialize PostgreSQL data directory if it doesn't exist
if [ ! -d "$PGDATA" ]; then
echo "Initializing PostgreSQL data directory..."
initdb --username=postgres --pwfile=/etc/postgresql/password
fi
# Configure PostgreSQL for replication
cat >> "$PGDATA/postgresql.conf" <<EOF
listen_addresses = '*'
wal_level = replica
max_wal_senders = 3
max_replication_slots = 3
hot_standby = on
EOF
# Configure authentication
cat >> "$PGDATA/pg_hba.conf" <<EOF
host replication replicator 0.0.0.0/0 md5
host all all 0.0.0.0/0 md5
EOF
env:
- name: PGDATA
value: /var/lib/postgresql/data
- name: POSTGRES_PASSWORD_FILE
value: /etc/postgresql/password
volumeMounts:
- name: postgresql-data
mountPath: /var/lib/postgresql/data
- name: postgresql-config
mountPath: /etc/postgresql
readOnly: true
containers:
- name: postgresql
image: postgres:15-alpine
ports:
- containerPort: 5432
name: postgresql
env:
- name: POSTGRES_DB
value: "myapp"
- name: POSTGRES_USER
value: "postgres"
- name: POSTGRES_PASSWORD_FILE
value: /etc/postgresql/password
- name: PGDATA
value: /var/lib/postgresql/data
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
resources:
requests:
memory: "1Gi"
cpu: "500m"
limits:
memory: "2Gi"
cpu: "1000m"
livenessProbe:
exec:
command:
- /bin/sh
- -c
- "pg_isready -U postgres -h localhost"
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
readinessProbe:
exec:
command:
- /bin/sh
- -c
- "pg_isready -U postgres -h localhost"
initialDelaySeconds: 5
periodSeconds: 5
timeoutSeconds: 3
failureThreshold: 3
volumeMounts:
- name: postgresql-data
mountPath: /var/lib/postgresql/data
- name: postgresql-config
mountPath: /etc/postgresql
readOnly: true
- name: postgresql-scripts
mountPath: /docker-entrypoint-initdb.d
readOnly: true
# PostgreSQL Exporter for monitoring
- name: postgres-exporter
image: prometheuscommunity/postgres-exporter:v0.12.0
ports:
- containerPort: 9187
name: metrics
env:
- name: DATA_SOURCE_NAME
value: "postgresql://postgres:$(POSTGRES_PASSWORD)@localhost:5432/postgres?sslmode=disable"
- name: POSTGRES_PASSWORD
valueFrom:
secretKeyRef:
name: postgresql-credentials
key: password
resources:
requests:
memory: "64Mi"
cpu: "50m"
limits:
memory: "128Mi"
cpu: "100m"
volumes:
- name: postgresql-config
secret:
secretName: postgresql-config
defaultMode: 0600
- name: postgresql-scripts
configMap:
name: postgresql-init-scripts
defaultMode: 0755
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- postgresql
topologyKey: kubernetes.io/hostname
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
preference:
matchExpressions:
- key: node-type
operator: In
values:
- database
volumeClaimTemplates:
- metadata:
name: postgresql-data
labels:
app: postgresql
spec:
accessModes: ["ReadWriteOnce"]
storageClassName: "fast-ssd"
resources:
requests:
storage: 100Gi
3.2 ConfigMaps and Secrets
Manage configuration data and sensitive information securely.
# Example: Application Configuration
apiVersion: v1
kind: ConfigMap
metadata:
name: app-config
namespace: production
labels:
app: web-app
config-type: application
data:
# Application configuration
app.properties: |
# Database configuration
database.pool.min-size=5
database.pool.max-size=20
database.pool.timeout=30000
# Cache configuration
cache.type=redis
cache.ttl=3600
cache.max-memory=512mb
# Logging configuration
logging.level=INFO
logging.format=json
logging.output=stdout
# Feature flags
features.new-ui=true
features.beta-features=false
features.analytics=true
# Redis configuration
redis.conf: |
bind 0.0.0.0
port 6379
timeout 0
tcp-keepalive 300
maxmemory 256mb
maxmemory-policy allkeys-lru
save 900 1
save 300 10
save 60 10000
# Nginx configuration
nginx.conf: |
user nginx;
worker_processes auto;
error_log /var/log/nginx/error.log warn;
pid /var/run/nginx.pid;
events {
worker_connections 1024;
use epoll;
multi_accept on;
}
http {
include /etc/nginx/mime.types;
default_type application/octet-stream;
log_format main '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for"';
access_log /var/log/nginx/access.log main;
sendfile on;
tcp_nopush on;
tcp_nodelay on;
keepalive_timeout 65;
types_hash_max_size 2048;
gzip on;
gzip_vary on;
gzip_min_length 1024;
gzip_types text/plain text/css application/json application/javascript text/xml application/xml application/xml+rss text/javascript;
upstream backend {
least_conn;
server web-app-service:8080 max_fails=3 fail_timeout=30s;
keepalive 32;
}
server {
listen 80;
server_name _;
location /health {
access_log off;
return 200 "healthy\n";
add_header Content-Type text/plain;
}
location / {
proxy_pass http://backend;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_connect_timeout 30s;
proxy_send_timeout 30s;
proxy_read_timeout 30s;
}
}
}
---
# Example: Secrets for sensitive data
apiVersion: v1
kind: Secret
metadata:
name: app-secrets
namespace: production
labels:
app: web-app
secret-type: application
type: Opaque
data:
# Base64 encoded values
database-url: cG9zdGdyZXNxbDovL3VzZXI6cGFzc3dvcmRAcG9zdGdyZXNxbDo1NDMyL215YXBw
redis-password: cmVkaXNfc2VjcmV0X3Bhc3N3b3Jk
jwt-secret: and0X3NlY3JldF9rZXlfZm9yX2F1dGhlbnRpY2F0aW9u
api-key: YXBpX2tleV9mb3JfZXh0ZXJuYWxfc2VydmljZXM=
---
# Example: TLS Secret for HTTPS
apiVersion: v1
kind: Secret
metadata:
name: tls-secret
namespace: production
labels:
app: web-app
secret-type: tls
type: kubernetes.io/tls
data:
tls.crt: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0t...
tls.key: LS0tLS1CRUdJTiBQUklWQVRFIEtFWS0tLS0t...
---
# Example: Docker registry secret
apiVersion: v1
kind: Secret
metadata:
name: registry-secret
namespace: production
labels:
secret-type: registry
type: kubernetes.io/dockerconfigjson
data:
.dockerconfigjson: eyJhdXRocyI6eyJteXJlZ2lzdHJ5LmNvbSI6eyJ1c2VybmFtZSI6InVzZXIiLCJwYXNzd29yZCI6InBhc3MiLCJhdXRoIjoiZFhObGNqcHdZWE56In19fQ==
3.3 Ingress Controllers and Traffic Management
Manage external access to services with advanced routing capabilities.
# Example: Comprehensive Ingress Configuration
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: web-app-ingress
namespace: production
labels:
app: web-app
ingress-type: public
annotations:
# Ingress class
kubernetes.io/ingress.class: "nginx"
# SSL/TLS configuration
cert-manager.io/cluster-issuer: "letsencrypt-prod"
nginx.ingress.kubernetes.io/ssl-redirect: "true"
nginx.ingress.kubernetes.io/force-ssl-redirect: "true"
# Rate limiting
nginx.ingress.kubernetes.io/rate-limit: "100"
nginx.ingress.kubernetes.io/rate-limit-window: "1m"
# CORS configuration
nginx.ingress.kubernetes.io/enable-cors: "true"
nginx.ingress.kubernetes.io/cors-allow-origin: "https://example.com,https://app.example.com"
nginx.ingress.kubernetes.io/cors-allow-methods: "GET,POST,PUT,DELETE,OPTIONS"
nginx.ingress.kubernetes.io/cors-allow-headers: "DNT,X-CustomHeader,Keep-Alive,User-Agent,X-Requested-With,If-Modified-Since,Cache-Control,Content-Type,Authorization"
# Security headers
nginx.ingress.kubernetes.io/configuration-snippet: |
add_header X-Frame-Options "SAMEORIGIN" always;
add_header X-Content-Type-Options "nosniff" always;
add_header X-XSS-Protection "1; mode=block" always;
add_header Referrer-Policy "strict-origin-when-cross-origin" always;
add_header Content-Security-Policy "default-src 'self'; script-src 'self' 'unsafe-inline' 'unsafe-eval'; style-src 'self' 'unsafe-inline';" always;
# Load balancing
nginx.ingress.kubernetes.io/upstream-hash-by: "$request_uri"
nginx.ingress.kubernetes.io/load-balance: "ewma"
# Timeouts
nginx.ingress.kubernetes.io/proxy-connect-timeout: "30"
nginx.ingress.kubernetes.io/proxy-send-timeout: "30"
nginx.ingress.kubernetes.io/proxy-read-timeout: "30"
# Buffer sizes
nginx.ingress.kubernetes.io/proxy-buffer-size: "8k"
nginx.ingress.kubernetes.io/proxy-buffers-number: "8"
# Custom error pages
nginx.ingress.kubernetes.io/custom-http-errors: "404,503"
nginx.ingress.kubernetes.io/default-backend: "error-pages"
spec:
tls:
- hosts:
- api.example.com
- app.example.com
secretName: tls-secret
rules:
# API endpoints
- host: api.example.com
http:
paths:
- path: /api/v1
pathType: Prefix
backend:
service:
name: web-app-service
port:
number: 80
- path: /api/v2
pathType: Prefix
backend:
service:
name: web-app-v2-service
port:
number: 80
- path: /health
pathType: Exact
backend:
service:
name: web-app-service
port:
number: 80
- path: /metrics
pathType: Exact
backend:
service:
name: web-app-service
port:
number: 9090
# Web application
- host: app.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: frontend-service
port:
number: 80
- path: /static
pathType: Prefix
backend:
service:
name: static-files-service
port:
number: 80
---
# Example: Advanced Ingress with canary deployment
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: web-app-canary
namespace: production
labels:
app: web-app
deployment-type: canary
annotations:
kubernetes.io/ingress.class: "nginx"
nginx.ingress.kubernetes.io/canary: "true"
nginx.ingress.kubernetes.io/canary-weight: "10"
nginx.ingress.kubernetes.io/canary-by-header: "X-Canary"
nginx.ingress.kubernetes.io/canary-by-header-value: "always"
nginx.ingress.kubernetes.io/canary-by-cookie: "canary"
spec:
tls:
- hosts:
- api.example.com
secretName: tls-secret
rules:
- host: api.example.com
http:
paths:
- path: /api/v1
pathType: Prefix
backend:
service:
name: web-app-canary-service
port:
number: 80
4. Kubernetes Security Best Practices
4.1 Role-Based Access Control (RBAC)
Implement fine-grained access control for cluster resources.
# Example: Comprehensive RBAC Configuration
apiVersion: v1
kind: ServiceAccount
metadata:
name: web-app-sa
namespace: production
labels:
app: web-app
component: service-account
automountServiceAccountToken: true
---
# ClusterRole for cross-namespace operations
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: web-app-cluster-role
labels:
app: web-app
rbac-type: cluster
rules:
# Read access to nodes for monitoring
- apiGroups: [""]
resources: ["nodes", "nodes/metrics", "nodes/stats"]
verbs: ["get", "list", "watch"]
# Read access to cluster-level metrics
- apiGroups: ["metrics.k8s.io"]
resources: ["nodes", "pods"]
verbs: ["get", "list"]
---
# Role for namespace-specific operations
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
namespace: production
name: web-app-role
labels:
app: web-app
rbac-type: namespace
rules:
# Pod management
- apiGroups: [""]
resources: ["pods", "pods/log", "pods/status"]
verbs: ["get", "list", "watch"]
- apiGroups: [""]
resources: ["pods/exec"]
verbs: ["create"]
# ConfigMap and Secret access
- apiGroups: [""]
resources: ["configmaps", "secrets"]
verbs: ["get", "list", "watch"]
# Service access
- apiGroups: [""]
resources: ["services", "endpoints"]
verbs: ["get", "list", "watch"]
# Deployment management (limited)
- apiGroups: ["apps"]
resources: ["deployments", "replicasets"]
verbs: ["get", "list", "watch"]
- apiGroups: ["apps"]
resources: ["deployments/scale"]
verbs: ["patch", "update"]
# Ingress access
- apiGroups: ["networking.k8s.io"]
resources: ["ingresses"]
verbs: ["get", "list", "watch"]
# Event access for debugging
- apiGroups: [""]
resources: ["events"]
verbs: ["get", "list", "watch"]
---
# ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: web-app-cluster-binding
labels:
app: web-app
rbac-type: cluster
subjects:
- kind: ServiceAccount
name: web-app-sa
namespace: production
roleRef:
kind: ClusterRole
name: web-app-cluster-role
apiGroup: rbac.authorization.k8s.io
---
# RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: web-app-binding
namespace: production
labels:
app: web-app
rbac-type: namespace
subjects:
- kind: ServiceAccount
name: web-app-sa
namespace: production
roleRef:
kind: Role
name: web-app-role
apiGroup: rbac.authorization.k8s.io
4.2 Pod Security Standards
Implement security policies to protect workloads.
# Example: Pod Security Policy (deprecated, use Pod Security Standards)
apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
name: restricted-psp
labels:
security-policy: restricted
spec:
privileged: false
allowPrivilegeEscalation: false
requiredDropCapabilities:
- ALL
volumes:
- 'configMap'
- 'emptyDir'
- 'projected'
- 'secret'
- 'downwardAPI'
- 'persistentVolumeClaim'
hostNetwork: false
hostIPC: false
hostPID: false
runAsUser:
rule: 'MustRunAsNonRoot'
supplementalGroups:
rule: 'MustRunAs'
ranges:
- min: 1
max: 65535
fsGroup:
rule: 'MustRunAs'
ranges:
- min: 1
max: 65535
readOnlyRootFilesystem: false
seLinux:
rule: 'RunAsAny'
---
# Example: Network Policy for traffic isolation
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: web-app-network-policy
namespace: production
labels:
app: web-app
policy-type: network
spec:
podSelector:
matchLabels:
app: web-app
policyTypes:
- Ingress
- Egress
ingress:
# Allow traffic from ingress controller
- from:
- namespaceSelector:
matchLabels:
name: ingress-nginx
- podSelector:
matchLabels:
app.kubernetes.io/name: ingress-nginx
ports:
- protocol: TCP
port: 8080
# Allow traffic from monitoring namespace
- from:
- namespaceSelector:
matchLabels:
name: monitoring
ports:
- protocol: TCP
port: 9090
# Allow traffic from same namespace
- from:
- namespaceSelector:
matchLabels:
name: production
ports:
- protocol: TCP
port: 8080
egress:
# Allow DNS resolution
- to: []
ports:
- protocol: UDP
port: 53
- protocol: TCP
port: 53
# Allow HTTPS to external services
- to: []
ports:
- protocol: TCP
port: 443
# Allow database access
- to:
- namespaceSelector:
matchLabels:
name: database
- podSelector:
matchLabels:
app: postgresql
ports:
- protocol: TCP
port: 5432
# Allow Redis access
- to:
- namespaceSelector:
matchLabels:
name: cache
- podSelector:
matchLabels:
app: redis
ports:
- protocol: TCP
port: 6379
5. Monitoring and Observability
5.1 Prometheus and Grafana Setup
Implement comprehensive monitoring for Kubernetes clusters.
# Example: Prometheus Configuration
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-config
namespace: monitoring
labels:
app: prometheus
component: config
data:
prometheus.yml: |
global:
scrape_interval: 15s
evaluation_interval: 15s
external_labels:
cluster: 'production'
region: 'us-west-1'
rule_files:
- "/etc/prometheus/rules/*.yml"
alerting:
alertmanagers:
- static_configs:
- targets:
- alertmanager:9093
scrape_configs:
# Kubernetes API server
- job_name: 'kubernetes-apiservers'
kubernetes_sd_configs:
- role: endpoints
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs:
- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
action: keep
regex: default;kubernetes;https
# Kubernetes nodes
- job_name: 'kubernetes-nodes'
kubernetes_sd_configs:
- role: node
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- target_label: __address__
replacement: kubernetes.default.svc:443
- source_labels: [__meta_kubernetes_node_name]
regex: (.+)
target_label: __metrics_path__
replacement: /api/v1/nodes/${1}/proxy/metrics
# Kubernetes pods
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
target_label: __address__
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
# Application metrics
- job_name: 'web-app'
kubernetes_sd_configs:
- role: endpoints
namespaces:
names:
- production
relabel_configs:
- source_labels: [__meta_kubernetes_service_label_app]
action: keep
regex: web-app
- source_labels: [__meta_kubernetes_endpoint_port_name]
action: keep
regex: metrics
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_service_name]
target_label: kubernetes_service_name
# Alert rules
alert_rules.yml: |
groups:
- name: kubernetes.rules
rules:
# High CPU usage
- alert: HighCPUUsage
expr: (100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)) > 80
for: 5m
labels:
severity: warning
annotations:
summary: "High CPU usage detected"
description: "CPU usage is above 80% for more than 5 minutes on {{ $labels.instance }}"
# High memory usage
- alert: HighMemoryUsage
expr: (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100 > 85
for: 5m
labels:
severity: warning
annotations:
summary: "High memory usage detected"
description: "Memory usage is above 85% for more than 5 minutes on {{ $labels.instance }}"
# Pod crash looping
- alert: PodCrashLooping
expr: rate(kube_pod_container_status_restarts_total[15m]) > 0
for: 5m
labels:
severity: critical
annotations:
summary: "Pod is crash looping"
description: "Pod {{ $labels.namespace }}/{{ $labels.pod }} is crash looping"
# Application down
- alert: ApplicationDown
expr: up{job="web-app"} == 0
for: 1m
labels:
severity: critical
annotations:
summary: "Application is down"
description: "Application {{ $labels.kubernetes_service_name }} in namespace {{ $labels.kubernetes_namespace }} is down"
---
# Prometheus Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: prometheus
namespace: monitoring
labels:
app: prometheus
component: server
spec:
replicas: 1
selector:
matchLabels:
app: prometheus
component: server
template:
metadata:
labels:
app: prometheus
component: server
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "9090"
spec:
serviceAccountName: prometheus
securityContext:
runAsUser: 65534
runAsGroup: 65534
fsGroup: 65534
containers:
- name: prometheus
image: prom/prometheus:v2.45.0
args:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.path=/prometheus/'
- '--web.console.libraries=/etc/prometheus/console_libraries'
- '--web.console.templates=/etc/prometheus/consoles'
- '--storage.tsdb.retention.time=15d'
- '--web.enable-lifecycle'
- '--web.enable-admin-api'
ports:
- containerPort: 9090
name: web
resources:
requests:
memory: "1Gi"
cpu: "500m"
limits:
memory: "2Gi"
cpu: "1000m"
volumeMounts:
- name: config
mountPath: /etc/prometheus
readOnly: true
- name: storage
mountPath: /prometheus
livenessProbe:
httpGet:
path: /-/healthy
port: 9090
initialDelaySeconds: 30
timeoutSeconds: 30
readinessProbe:
httpGet:
path: /-/ready
port: 9090
initialDelaySeconds: 30
timeoutSeconds: 30
volumes:
- name: config
configMap:
name: prometheus-config
- name: storage
persistentVolumeClaim:
claimName: prometheus-storage
6. Best Practices and Optimization
6.1 Resource Management
Implement proper resource requests and limits for optimal cluster utilization.
# Example: Horizontal Pod Autoscaler
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: web-app-hpa
namespace: production
labels:
app: web-app
autoscaling-type: horizontal
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: web-app-deployment
minReplicas: 3
maxReplicas: 50
metrics:
# CPU utilization
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
# Memory utilization
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
# Custom metrics (requests per second)
- type: Pods
pods:
metric:
name: http_requests_per_second
target:
type: AverageValue
averageValue: "100"
behavior:
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 10
periodSeconds: 60
- type: Pods
value: 2
periodSeconds: 60
selectPolicy: Min
scaleUp:
stabilizationWindowSeconds: 60
policies:
- type: Percent
value: 50
periodSeconds: 60
- type: Pods
value: 5
periodSeconds: 60
selectPolicy: Max
---
# Example: Vertical Pod Autoscaler
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: web-app-vpa
namespace: production
labels:
app: web-app
autoscaling-type: vertical
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: web-app-deployment
updatePolicy:
updateMode: "Auto"
resourcePolicy:
containerPolicies:
- containerName: web-app
minAllowed:
cpu: 100m
memory: 128Mi
maxAllowed:
cpu: 2000m
memory: 2Gi
controlledResources: ["cpu", "memory"]
controlledValues: RequestsAndLimits
---
# Example: Pod Disruption Budget
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: web-app-pdb
namespace: production
labels:
app: web-app
policy-type: disruption
spec:
minAvailable: 2
selector:
matchLabels:
app: web-app
6.2 Cluster Autoscaling
Configure cluster autoscaling for dynamic node management.
# Example: Cluster Autoscaler Configuration
apiVersion: apps/v1
kind: Deployment
metadata:
name: cluster-autoscaler
namespace: kube-system
labels:
app: cluster-autoscaler
spec:
replicas: 1
selector:
matchLabels:
app: cluster-autoscaler
template:
metadata:
labels:
app: cluster-autoscaler
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "8085"
spec:
serviceAccountName: cluster-autoscaler
containers:
- image: k8s.gcr.io/autoscaling/cluster-autoscaler:v1.27.0
name: cluster-autoscaler
resources:
limits:
cpu: 100m
memory: 300Mi
requests:
cpu: 100m
memory: 300Mi
command:
- ./cluster-autoscaler
- --v=4
- --stderrthreshold=info
- --cloud-provider=aws
- --skip-nodes-with-local-storage=false
- --expander=least-waste
- --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/production
- --balance-similar-node-groups
- --scale-down-enabled=true
- --scale-down-delay-after-add=10m
- --scale-down-unneeded-time=10m
- --scale-down-utilization-threshold=0.5
- --max-node-provision-time=15m
- --max-empty-bulk-delete=10
- --max-graceful-termination-sec=600
env:
- name: AWS_REGION
value: us-west-1
volumeMounts:
- name: ssl-certs
mountPath: /etc/ssl/certs/ca-certificates.crt
readOnly: true
imagePullPolicy: "Always"
volumes:
- name: ssl-certs
hostPath:
path: "/etc/ssl/certs/ca-bundle.crt"
nodeSelector:
node-role.kubernetes.io/control-plane: ""
tolerations:
- effect: NoSchedule
key: node-role.kubernetes.io/control-plane
7. Disaster Recovery and Backup
7.1 Backup Strategies
Implement comprehensive backup solutions for Kubernetes resources and data.
# Example: Velero Backup Configuration
apiVersion: velero.io/v1
kind: Backup
metadata:
name: daily-backup
namespace: velero
labels:
backup-type: scheduled
spec:
# Include all namespaces except system ones
includedNamespaces:
- production
- staging
- monitoring
# Exclude specific resources
excludedResources:
- events
- events.events.k8s.io
# Include cluster-scoped resources
includeClusterResources: true
# Storage location
storageLocation: default
# Volume snapshot locations
volumeSnapshotLocations:
- default
# TTL for backup retention
ttl: 720h0m0s # 30 days
# Hooks for application-consistent backups
hooks:
resources:
- name: postgresql-backup-hook
includedNamespaces:
- database
includedResources:
- pods
labelSelector:
matchLabels:
app: postgresql
pre:
- exec:
container: postgresql
command:
- /bin/bash
- -c
- "pg_dump -U postgres myapp > /tmp/backup.sql"
onError: Fail
timeout: 5m
post:
- exec:
container: postgresql
command:
- /bin/bash
- -c
- "rm -f /tmp/backup.sql"
---
# Example: Scheduled Backup
apiVersion: velero.io/v1
kind: Schedule
metadata:
name: daily-backup-schedule
namespace: velero
labels:
schedule-type: daily
spec:
# Run daily at 2 AM UTC
schedule: "0 2 * * *"
template:
includedNamespaces:
- production
- staging
- monitoring
excludedResources:
- events
- events.events.k8s.io
includeClusterResources: true
storageLocation: default
volumeSnapshotLocations:
- default
ttl: 720h0m0s
---
# Example: Restore Configuration
apiVersion: velero.io/v1
kind: Restore
metadata:
name: production-restore
namespace: velero
labels:
restore-type: disaster-recovery
spec:
# Source backup
backupName: daily-backup-20240315
# Restore specific namespaces
includedNamespaces:
- production
# Exclude specific resources during restore
excludedResources:
- nodes
- events
- events.events.k8s.io
# Namespace mappings for restore to different namespace
namespaceMapping:
production: production-restored
# Restore PVs
restorePVs: true
# Preserve node ports
preserveNodePorts: false
# Include cluster resources
includeClusterResources: true
Conclusion
Kubernetes container orchestration provides a powerful foundation for building and managing cloud-native applications at scale. This comprehensive guide has covered the essential concepts, advanced features, and best practices necessary for successful Kubernetes deployments.
Key takeaways for effective Kubernetes adoption:
-
Start with solid fundamentals: Master core concepts like Pods, Services, and Deployments before moving to advanced features.
-
Implement security from day one: Use RBAC, network policies, and security contexts to protect your workloads.
-
Plan for observability: Implement comprehensive monitoring, logging, and tracing from the beginning.
-
Automate everything: Leverage GitOps, CI/CD pipelines, and infrastructure as code for consistent deployments.
-
Design for resilience: Use health checks, resource limits, and disruption budgets to build fault-tolerant applications.
-
Optimize resource utilization: Implement autoscaling and resource management to maximize efficiency.
-
Prepare for disasters: Establish backup and recovery procedures to protect against data loss.
By following these practices and continuously learning about new Kubernetes features, you can build robust, scalable, and maintainable cloud-native applications that leverage the full power of container orchestration.
Remember that Kubernetes is a rapidly evolving ecosystem, so stay updated with the latest releases, security patches, and community best practices to ensure your deployments remain secure and efficient.