Performance Optimization for Cloud-Native Applications: From Profiling to Production Tuning
Performance optimization is a critical aspect of running successful cloud-native applications at scale. This comprehensive guide covers systematic approaches to identifying performance bottlenecks, implementing optimization strategies, and maintaining optimal performance in production environments.
Performance Optimization Methodology
Performance Engineering Lifecycle
graph TB
subgraph "Performance Engineering Lifecycle"
A[Requirements Analysis] --> B[Baseline Measurement]
B --> C[Profiling & Analysis]
C --> D[Optimization Implementation]
D --> E[Testing & Validation]
E --> F[Production Deployment]
F --> G[Continuous Monitoring]
G --> C
end
subgraph "Key Metrics"
H[Response Time]
I[Throughput]
J[Resource Utilization]
K[Error Rate]
L[Scalability]
end
subgraph "Optimization Areas"
M[Application Code]
N[Database Queries]
O[Caching Layer]
P[Network I/O]
Q[Resource Allocation]
end
C --> H
C --> I
C --> J
C --> K
C --> L
D --> M
D --> N
D --> O
D --> P
D --> Q
Performance Metrics Framework
| Category | Metric | Target | Measurement Method |
|---|---|---|---|
| Latency | P50 Response Time | < 100ms | Application metrics |
| P95 Response Time | < 500ms | Application metrics | |
| P99 Response Time | < 1000ms | Application metrics | |
| Throughput | Requests per Second | > 1000 RPS | Load testing |
| Transactions per Second | > 500 TPS | Database metrics | |
| Resource | CPU Utilization | < 70% | System metrics |
| Memory Utilization | < 80% | System metrics | |
| Disk I/O | < 80% | System metrics | |
| Availability | Uptime | > 99.9% | Health checks |
| Error Rate | < 0.1% | Application logs |
Application Profiling and Analysis
Go Application Profiling
// profiling/profiler.go
package main
import (
"context"
"fmt"
"log"
"net/http"
_ "net/http/pprof"
"os"
"runtime"
"runtime/pprof"
"time"
"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/client_golang/prometheus/promauto"
"github.com/prometheus/client_golang/prometheus/promhttp"
)
// Custom metrics for performance monitoring
var (
requestDuration = promauto.NewHistogramVec(
prometheus.HistogramOpts{
Name: "http_request_duration_seconds",
Help: "Duration of HTTP requests in seconds",
Buckets: prometheus.DefBuckets,
},
[]string{"method", "endpoint", "status_code"},
)
memoryUsage = promauto.NewGaugeVec(
prometheus.GaugeOpts{
Name: "memory_usage_bytes",
Help: "Current memory usage in bytes",
},
[]string{"type"},
)
goroutineCount = promauto.NewGauge(
prometheus.GaugeOpts{
Name: "goroutines_total",
Help: "Current number of goroutines",
},
)
gcDuration = promauto.NewHistogram(
prometheus.HistogramOpts{
Name: "gc_duration_seconds",
Help: "Duration of garbage collection cycles",
Buckets: []float64{0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1.0},
},
)
)
type PerformanceProfiler struct {
cpuProfileFile string
memProfileFile string
blockProfileFile string
mutexProfileFile string
traceFile string
profilingEnabled bool
}
func NewPerformanceProfiler() *PerformanceProfiler {
return &PerformanceProfiler{
cpuProfileFile: "cpu.prof",
memProfileFile: "mem.prof",
blockProfileFile: "block.prof",
mutexProfileFile: "mutex.prof",
traceFile: "trace.out",
profilingEnabled: os.Getenv("ENABLE_PROFILING") == "true",
}
}
func (pp *PerformanceProfiler) StartCPUProfiling() error {
if !pp.profilingEnabled {
return nil
}
f, err := os.Create(pp.cpuProfileFile)
if err != nil {
return fmt.Errorf("could not create CPU profile: %w", err)
}
if err := pprof.StartCPUProfile(f); err != nil {
f.Close()
return fmt.Errorf("could not start CPU profile: %w", err)
}
log.Printf("CPU profiling started, writing to %s", pp.cpuProfileFile)
return nil
}
func (pp *PerformanceProfiler) StopCPUProfiling() {
if !pp.profilingEnabled {
return
}
pprof.StopCPUProfile()
log.Printf("CPU profiling stopped")
}
func (pp *PerformanceProfiler) WriteMemoryProfile() error {
if !pp.profilingEnabled {
return nil
}
f, err := os.Create(pp.memProfileFile)
if err != nil {
return fmt.Errorf("could not create memory profile: %w", err)
}
defer f.Close()
runtime.GC() // Force GC before memory profile
if err := pprof.WriteHeapProfile(f); err != nil {
return fmt.Errorf("could not write memory profile: %w", err)
}
log.Printf("Memory profile written to %s", pp.memProfileFile)
return nil
}
func (pp *PerformanceProfiler) WriteBlockProfile() error {
if !pp.profilingEnabled {
return nil
}
f, err := os.Create(pp.blockProfileFile)
if err != nil {
return fmt.Errorf("could not create block profile: %w", err)
}
defer f.Close()
if err := pprof.Lookup("block").WriteTo(f, 0); err != nil {
return fmt.Errorf("could not write block profile: %w", err)
}
log.Printf("Block profile written to %s", pp.blockProfileFile)
return nil
}
func (pp *PerformanceProfiler) WriteMutexProfile() error {
if !pp.profilingEnabled {
return nil
}
f, err := os.Create(pp.mutexProfileFile)
if err != nil {
return fmt.Errorf("could not create mutex profile: %w", err)
}
defer f.Close()
if err := pprof.Lookup("mutex").WriteTo(f, 0); err != nil {
return fmt.Errorf("could not write mutex profile: %w", err)
}
log.Printf("Mutex profile written to %s", pp.mutexProfileFile)
return nil
}
// Performance monitoring middleware
func (pp *PerformanceProfiler) PerformanceMiddleware(next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
start := time.Now()
// Wrap response writer to capture status code
wrapped := &responseWriter{ResponseWriter: w, statusCode: http.StatusOK}
// Process request
next.ServeHTTP(wrapped, r)
// Record metrics
duration := time.Since(start).Seconds()
requestDuration.WithLabelValues(
r.Method,
r.URL.Path,
fmt.Sprintf("%d", wrapped.statusCode),
).Observe(duration)
})
}
type responseWriter struct {
http.ResponseWriter
statusCode int
}
func (rw *responseWriter) WriteHeader(code int) {
rw.statusCode = code
rw.ResponseWriter.WriteHeader(code)
}
// Runtime metrics collector
func (pp *PerformanceProfiler) StartMetricsCollection(ctx context.Context) {
ticker := time.NewTicker(10 * time.Second)
defer ticker.Stop()
for {
select {
case <-ctx.Done():
return
case <-ticker.C:
pp.collectRuntimeMetrics()
}
}
}
func (pp *PerformanceProfiler) collectRuntimeMetrics() {
var m runtime.MemStats
runtime.ReadMemStats(&m)
// Memory metrics
memoryUsage.WithLabelValues("heap_alloc").Set(float64(m.HeapAlloc))
memoryUsage.WithLabelValues("heap_sys").Set(float64(m.HeapSys))
memoryUsage.WithLabelValues("heap_idle").Set(float64(m.HeapIdle))
memoryUsage.WithLabelValues("heap_inuse").Set(float64(m.HeapInuse))
memoryUsage.WithLabelValues("stack_inuse").Set(float64(m.StackInuse))
memoryUsage.WithLabelValues("stack_sys").Set(float64(m.StackSys))
// Goroutine count
goroutineCount.Set(float64(runtime.NumGoroutine()))
// GC metrics
gcDuration.Observe(float64(m.PauseTotalNs) / 1e9)
}
// Benchmark helper functions
func BenchmarkCriticalPath(b *testing.B, fn func()) {
b.ResetTimer()
b.ReportAllocs()
for i := 0; i < b.N; i++ {
fn()
}
}
func ProfileFunction(name string, fn func()) {
start := time.Now()
fn()
duration := time.Since(start)
log.Printf("Function %s took %v", name, duration)
}
// Example usage in main application
func main() {
profiler := NewPerformanceProfiler()
// Enable block and mutex profiling
runtime.SetBlockProfileRate(1)
runtime.SetMutexProfileFraction(1)
// Start CPU profiling
if err := profiler.StartCPUProfiling(); err != nil {
log.Printf("Failed to start CPU profiling: %v", err)
}
defer profiler.StopCPUProfiling()
// Start metrics collection
ctx, cancel := context.WithCancel(context.Background())
defer cancel()
go profiler.StartMetricsCollection(ctx)
// Set up HTTP server with profiling endpoints
mux := http.NewServeMux()
// Application endpoints
mux.HandleFunc("/api/users", handleUsers)
mux.HandleFunc("/api/orders", handleOrders)
// Profiling endpoints (only in development)
if os.Getenv("ENVIRONMENT") == "development" {
mux.HandleFunc("/debug/pprof/", pprof.Index)
mux.HandleFunc("/debug/pprof/cmdline", pprof.Cmdline)
mux.HandleFunc("/debug/pprof/profile", pprof.Profile)
mux.HandleFunc("/debug/pprof/symbol", pprof.Symbol)
mux.HandleFunc("/debug/pprof/trace", pprof.Trace)
}
// Metrics endpoint
mux.Handle("/metrics", promhttp.Handler())
// Apply performance middleware
handler := profiler.PerformanceMiddleware(mux)
log.Println("Server starting on :8080")
log.Fatal(http.ListenAndServe(":8080", handler))
}
func handleUsers(w http.ResponseWriter, r *http.Request) {
// Simulate some work
time.Sleep(50 * time.Millisecond)
w.WriteHeader(http.StatusOK)
w.Write([]byte(`{"users": []}`))
}
func handleOrders(w http.ResponseWriter, r *http.Request) {
// Simulate database query
time.Sleep(100 * time.Millisecond)
w.WriteHeader(http.StatusOK)
w.Write([]byte(`{"orders": []}`))
}
Database Query Optimization
-- database/optimization.sql
-- Index optimization for common query patterns
CREATE INDEX CONCURRENTLY idx_users_email_active
ON users (email)
WHERE active = true;
CREATE INDEX CONCURRENTLY idx_orders_user_created
ON orders (user_id, created_at DESC)
INCLUDE (total_amount, status);
CREATE INDEX CONCURRENTLY idx_products_category_price
ON products (category_id, price)
WHERE available = true;
-- Partial index for recent data
CREATE INDEX CONCURRENTLY idx_logs_recent
ON application_logs (created_at, level)
WHERE created_at > NOW() - INTERVAL '7 days';
-- Query optimization examples
-- Before: Inefficient query
-- SELECT * FROM orders o
-- JOIN users u ON o.user_id = u.id
-- WHERE o.created_at > '2024-01-01'
-- ORDER BY o.created_at DESC;
-- After: Optimized query with specific columns and better indexing
SELECT o.id, o.total_amount, o.status, o.created_at, u.email, u.name
FROM orders o
JOIN users u ON o.user_id = u.id
WHERE o.created_at > '2024-01-01'
AND o.status IN ('pending', 'processing')
ORDER BY o.created_at DESC
LIMIT 100;
-- Materialized view for complex aggregations
CREATE MATERIALIZED VIEW daily_sales_summary AS
SELECT
DATE(created_at) as sale_date,
COUNT(*) as order_count,
SUM(total_amount) as total_revenue,
AVG(total_amount) as avg_order_value,
COUNT(DISTINCT user_id) as unique_customers
FROM orders
WHERE status = 'completed'
GROUP BY DATE(created_at)
ORDER BY sale_date DESC;
-- Refresh materialized view (can be automated)
REFRESH MATERIALIZED VIEW CONCURRENTLY daily_sales_summary;
-- Query performance monitoring
CREATE OR REPLACE FUNCTION log_slow_queries()
RETURNS event_trigger AS $$
BEGIN
-- Log queries taking longer than 1 second
IF current_setting('log_min_duration_statement')::int > 1000 THEN
RAISE LOG 'Slow query detected: %', current_query();
END IF;
END;
$$ LANGUAGE plpgsql;
-- Connection pooling configuration
-- postgresql.conf optimizations
/*
# Connection settings
max_connections = 200
shared_buffers = 256MB
effective_cache_size = 1GB
work_mem = 4MB
maintenance_work_mem = 64MB
# Checkpoint settings
checkpoint_completion_target = 0.9
wal_buffers = 16MB
default_statistics_target = 100
# Query planner settings
random_page_cost = 1.1
effective_io_concurrency = 200
# Logging settings
log_min_duration_statement = 1000
log_checkpoints = on
log_connections = on
log_disconnections = on
log_lock_waits = on
*/
Caching Strategy Implementation
// cache/redis_cache.go
package cache
import (
"context"
"encoding/json"
"fmt"
"time"
"github.com/go-redis/redis/v8"
"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/client_golang/prometheus/promauto"
)
var (
cacheHits = promauto.NewCounterVec(
prometheus.CounterOpts{
Name: "cache_hits_total",
Help: "Total number of cache hits",
},
[]string{"cache_type", "key_pattern"},
)
cacheMisses = promauto.NewCounterVec(
prometheus.CounterOpts{
Name: "cache_misses_total",
Help: "Total number of cache misses",
},
[]string{"cache_type", "key_pattern"},
)
cacheOperationDuration = promauto.NewHistogramVec(
prometheus.HistogramOpts{
Name: "cache_operation_duration_seconds",
Help: "Duration of cache operations",
Buckets: []float64{0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1.0},
},
[]string{"operation", "cache_type"},
)
)
type CacheConfig struct {
RedisAddr string
RedisPassword string
RedisDB int
DefaultTTL time.Duration
MaxRetries int
}
type Cache struct {
client *redis.Client
defaultTTL time.Duration
}
func NewCache(config CacheConfig) *Cache {
rdb := redis.NewClient(&redis.Options{
Addr: config.RedisAddr,
Password: config.RedisPassword,
DB: config.RedisDB,
MaxRetries: config.MaxRetries,
PoolSize: 20,
MinIdleConns: 5,
})
return &Cache{
client: rdb,
defaultTTL: config.DefaultTTL,
}
}
func (c *Cache) Get(ctx context.Context, key string, dest interface{}) error {
start := time.Now()
defer func() {
cacheOperationDuration.WithLabelValues("get", "redis").Observe(time.Since(start).Seconds())
}()
val, err := c.client.Get(ctx, key).Result()
if err != nil {
if err == redis.Nil {
cacheMisses.WithLabelValues("redis", c.getKeyPattern(key)).Inc()
return ErrCacheMiss
}
return fmt.Errorf("cache get error: %w", err)
}
cacheHits.WithLabelValues("redis", c.getKeyPattern(key)).Inc()
if err := json.Unmarshal([]byte(val), dest); err != nil {
return fmt.Errorf("cache unmarshal error: %w", err)
}
return nil
}
func (c *Cache) Set(ctx context.Context, key string, value interface{}, ttl time.Duration) error {
start := time.Now()
defer func() {
cacheOperationDuration.WithLabelValues("set", "redis").Observe(time.Since(start).Seconds())
}()
if ttl == 0 {
ttl = c.defaultTTL
}
data, err := json.Marshal(value)
if err != nil {
return fmt.Errorf("cache marshal error: %w", err)
}
if err := c.client.Set(ctx, key, data, ttl).Err(); err != nil {
return fmt.Errorf("cache set error: %w", err)
}
return nil
}
func (c *Cache) Delete(ctx context.Context, key string) error {
start := time.Now()
defer func() {
cacheOperationDuration.WithLabelValues("delete", "redis").Observe(time.Since(start).Seconds())
}()
if err := c.client.Del(ctx, key).Err(); err != nil {
return fmt.Errorf("cache delete error: %w", err)
}
return nil
}
func (c *Cache) GetOrSet(ctx context.Context, key string, dest interface{}, ttl time.Duration, fetchFn func() (interface{}, error)) error {
// Try to get from cache first
err := c.Get(ctx, key, dest)
if err == nil {
return nil // Cache hit
}
if err != ErrCacheMiss {
return err // Actual error
}
// Cache miss, fetch data
data, err := fetchFn()
if err != nil {
return fmt.Errorf("fetch function error: %w", err)
}
// Set in cache
if err := c.Set(ctx, key, data, ttl); err != nil {
// Log error but don't fail the request
log.Printf("Failed to set cache: %v", err)
}
// Copy data to destination
dataBytes, _ := json.Marshal(data)
return json.Unmarshal(dataBytes, dest)
}
func (c *Cache) getKeyPattern(key string) string {
// Extract pattern from key for metrics
parts := strings.Split(key, ":")
if len(parts) >= 2 {
return parts[0] + ":" + parts[1]
}
return "unknown"
}
var ErrCacheMiss = fmt.Errorf("cache miss")
// Multi-level cache implementation
type MultiLevelCache struct {
l1Cache *sync.Map // In-memory cache
l2Cache *Cache // Redis cache
l1TTL time.Duration
l2TTL time.Duration
}
func NewMultiLevelCache(redisCache *Cache, l1TTL, l2TTL time.Duration) *MultiLevelCache {
return &MultiLevelCache{
l1Cache: &sync.Map{},
l2Cache: redisCache,
l1TTL: l1TTL,
l2TTL: l2TTL,
}
}
type cacheItem struct {
data interface{}
expiresAt time.Time
}
func (mlc *MultiLevelCache) Get(ctx context.Context, key string, dest interface{}) error {
// Try L1 cache first
if item, ok := mlc.l1Cache.Load(key); ok {
cacheItem := item.(*cacheItem)
if time.Now().Before(cacheItem.expiresAt) {
cacheHits.WithLabelValues("memory", "l1").Inc()
return mlc.copyData(cacheItem.data, dest)
}
// Expired, remove from L1
mlc.l1Cache.Delete(key)
}
// Try L2 cache
err := mlc.l2Cache.Get(ctx, key, dest)
if err == nil {
// Store in L1 cache
mlc.l1Cache.Store(key, &cacheItem{
data: dest,
expiresAt: time.Now().Add(mlc.l1TTL),
})
return nil
}
return err
}
func (mlc *MultiLevelCache) Set(ctx context.Context, key string, value interface{}, ttl time.Duration) error {
// Set in L2 cache
if err := mlc.l2Cache.Set(ctx, key, value, ttl); err != nil {
return err
}
// Set in L1 cache
mlc.l1Cache.Store(key, &cacheItem{
data: value,
expiresAt: time.Now().Add(mlc.l1TTL),
})
return nil
}
func (mlc *MultiLevelCache) copyData(src, dest interface{}) error {
data, err := json.Marshal(src)
if err != nil {
return err
}
return json.Unmarshal(data, dest)
}
// Cache warming strategy
type CacheWarmer struct {
cache *Cache
warmers map[string]WarmupFunction
interval time.Duration
}
type WarmupFunction func(ctx context.Context) (map[string]interface{}, error)
func NewCacheWarmer(cache *Cache, interval time.Duration) *CacheWarmer {
return &CacheWarmer{
cache: cache,
warmers: make(map[string]WarmupFunction),
interval: interval,
}
}
func (cw *CacheWarmer) RegisterWarmer(name string, fn WarmupFunction) {
cw.warmers[name] = fn
}
func (cw *CacheWarmer) Start(ctx context.Context) {
ticker := time.NewTicker(cw.interval)
defer ticker.Stop()
// Initial warmup
cw.warmup(ctx)
for {
select {
case <-ctx.Done():
return
case <-ticker.C:
cw.warmup(ctx)
}
}
}
func (cw *CacheWarmer) warmup(ctx context.Context) {
for name, warmer := range cw.warmers {
go func(name string, warmer WarmupFunction) {
data, err := warmer(ctx)
if err != nil {
log.Printf("Cache warmer %s failed: %v", name, err)
return
}
for key, value := range data {
if err := cw.cache.Set(ctx, key, value, 30*time.Minute); err != nil {
log.Printf("Failed to warm cache key %s: %v", key, err)
}
}
log.Printf("Cache warmer %s completed, warmed %d keys", name, len(data))
}(name, warmer)
}
}
Kubernetes Resource Optimization
Resource Requests and Limits Optimization
# k8s/resource-optimization.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: resource-optimization-config
namespace: production
data:
# Resource optimization guidelines
guidelines.yaml: |
resource_guidelines:
cpu:
request_ratio: 0.1 # 10% of limit
limit_ratio: 1.0 # 100% of allocated
burst_allowance: 2.0 # 200% burst capacity
memory:
request_ratio: 0.8 # 80% of limit
limit_ratio: 1.0 # 100% of allocated
oom_kill_threshold: 0.95 # 95% before OOM
application_profiles:
web_frontend:
cpu_request: "100m"
cpu_limit: "500m"
memory_request: "128Mi"
memory_limit: "256Mi"
api_backend:
cpu_request: "200m"
cpu_limit: "1000m"
memory_request: "256Mi"
memory_limit: "512Mi"
database:
cpu_request: "500m"
cpu_limit: "2000m"
memory_request: "1Gi"
memory_limit: "2Gi"
cache:
cpu_request: "100m"
cpu_limit: "500m"
memory_request: "512Mi"
memory_limit: "1Gi"
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: optimized-application
namespace: production
spec:
replicas: 3
selector:
matchLabels:
app: optimized-application
template:
metadata:
labels:
app: optimized-application
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "8080"
prometheus.io/path: "/metrics"
spec:
containers:
- name: app
image: myapp:optimized
ports:
- containerPort: 8080
name: http
resources:
requests:
memory: "256Mi"
cpu: "200m"
ephemeral-storage: "1Gi"
limits:
memory: "512Mi"
cpu: "1000m"
ephemeral-storage: "2Gi"
env:
- name: GOMAXPROCS
valueFrom:
resourceFieldRef:
resource: limits.cpu
- name: GOMEMLIMIT
valueFrom:
resourceFieldRef:
resource: limits.memory
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
timeoutSeconds: 3
failureThreshold: 3
startupProbe:
httpGet:
path: /startup
port: 8080
initialDelaySeconds: 10
periodSeconds: 5
timeoutSeconds: 3
failureThreshold: 30
securityContext:
allowPrivilegeEscalation: false
runAsNonRoot: true
runAsUser: 1000
capabilities:
drop:
- ALL
readOnlyRootFilesystem: true
volumeMounts:
- name: tmp
mountPath: /tmp
- name: cache
mountPath: /app/cache
volumes:
- name: tmp
emptyDir: {}
- name: cache
emptyDir:
sizeLimit: 1Gi
nodeSelector:
node-type: "compute-optimized"
tolerations:
- key: "high-performance"
operator: "Equal"
value: "true"
effect: "NoSchedule"
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- optimized-application
topologyKey: kubernetes.io/hostname
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: optimized-application-hpa
namespace: production
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: optimized-application
minReplicas: 3
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
- type: Pods
pods:
metric:
name: http_requests_per_second
target:
type: AverageValue
averageValue: "100"
behavior:
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 10
periodSeconds: 60
scaleUp:
stabilizationWindowSeconds: 60
policies:
- type: Percent
value: 50
periodSeconds: 60
- type: Pods
value: 2
periodSeconds: 60
selectPolicy: Max
---
apiVersion: autoscaling/v1
kind: VerticalPodAutoscaler
metadata:
name: optimized-application-vpa
namespace: production
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: optimized-application
updatePolicy:
updateMode: "Auto"
resourcePolicy:
containerPolicies:
- containerName: app
minAllowed:
cpu: 100m
memory: 128Mi
maxAllowed:
cpu: 2000m
memory: 1Gi
controlledResources: ["cpu", "memory"]
Performance Monitoring and Alerting
# monitoring/performance-alerts.yaml
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: performance-alerts
namespace: monitoring
spec:
groups:
- name: performance.rules
rules:
# High latency alerts
- alert: HighResponseTime
expr: histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m])) > 0.5
for: 2m
labels:
severity: warning
category: performance
annotations:
summary: "High response time detected"
description: "95th percentile response time is {{ $value }}s for {{ $labels.endpoint }}"
- alert: VeryHighResponseTime
expr: histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m])) > 1.0
for: 1m
labels:
severity: critical
category: performance
annotations:
summary: "Very high response time detected"
description: "95th percentile response time is {{ $value }}s for {{ $labels.endpoint }}"
# Throughput alerts
- alert: LowThroughput
expr: rate(http_requests_total[5m]) < 10
for: 5m
labels:
severity: warning
category: performance
annotations:
summary: "Low throughput detected"
description: "Request rate is {{ $value }} requests/second"
# Resource utilization alerts
- alert: HighCPUUtilization
expr: rate(container_cpu_usage_seconds_total[5m]) / container_spec_cpu_quota * container_spec_cpu_period > 0.8
for: 5m
labels:
severity: warning
category: performance
annotations:
summary: "High CPU utilization"
description: "CPU utilization is {{ $value | humanizePercentage }} for {{ $labels.pod }}"
- alert: HighMemoryUtilization
expr: container_memory_usage_bytes / container_spec_memory_limit_bytes > 0.9
for: 2m
labels:
severity: critical
category: performance
annotations:
summary: "High memory utilization"
description: "Memory utilization is {{ $value | humanizePercentage }} for {{ $labels.pod }}"
# Cache performance alerts
- alert: LowCacheHitRate
expr: rate(cache_hits_total[5m]) / (rate(cache_hits_total[5m]) + rate(cache_misses_total[5m])) < 0.8
for: 5m
labels:
severity: warning
category: performance
annotations:
summary: "Low cache hit rate"
description: "Cache hit rate is {{ $value | humanizePercentage }} for {{ $labels.cache_type }}"
# Database performance alerts
- alert: SlowDatabaseQueries
expr: rate(postgresql_slow_queries_total[5m]) > 1
for: 2m
labels:
severity: warning
category: performance
annotations:
summary: "Slow database queries detected"
description: "{{ $value }} slow queries per second detected"
# Garbage collection alerts
- alert: HighGCPressure
expr: rate(go_gc_duration_seconds_sum[5m]) / rate(go_gc_duration_seconds_count[5m]) > 0.1
for: 5m
labels:
severity: warning
category: performance
annotations:
summary: "High garbage collection pressure"
description: "Average GC duration is {{ $value }}s"
---
apiVersion: v1
kind: ConfigMap
metadata:
name: grafana-performance-dashboard
namespace: monitoring
data:
performance-dashboard.json: |
{
"dashboard": {
"id": null,
"title": "Application Performance Dashboard",
"tags": ["performance", "monitoring"],
"timezone": "browser",
"panels": [
{
"id": 1,
"title": "Response Time Percentiles",
"type": "graph",
"targets": [
{
"expr": "histogram_quantile(0.50, rate(http_request_duration_seconds_bucket[5m]))",
"legendFormat": "50th percentile"
},
{
"expr": "histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))",
"legendFormat": "95th percentile"
},
{
"expr": "histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[5m]))",
"legendFormat": "99th percentile"
}
],
"yAxes": [
{
"label": "Response Time (seconds)",
"min": 0
}
]
},
{
"id": 2,
"title": "Request Rate",
"type": "graph",
"targets": [
{
"expr": "rate(http_requests_total[5m])",
"legendFormat": "{{ method }} {{ endpoint }}"
}
],
"yAxes": [
{
"label": "Requests per second",
"min": 0
}
]
},
{
"id": 3,
"title": "Error Rate",
"type": "graph",
"targets": [
{
"expr": "rate(http_requests_total{status_code=~\"5..\"}[5m]) / rate(http_requests_total[5m])",
"legendFormat": "Error rate"
}
],
"yAxes": [
{
"label": "Error rate",
"min": 0,
"max": 1
}
]
},
{
"id": 4,
"title": "Resource Utilization",
"type": "graph",
"targets": [
{
"expr": "rate(container_cpu_usage_seconds_total[5m])",
"legendFormat": "CPU usage - {{ pod }}"
},
{
"expr": "container_memory_usage_bytes / 1024 / 1024",
"legendFormat": "Memory usage (MB) - {{ pod }}"
}
]
},
{
"id": 5,
"title": "Cache Performance",
"type": "graph",
"targets": [
{
"expr": "rate(cache_hits_total[5m])",
"legendFormat": "Cache hits/sec - {{ cache_type }}"
},
{
"expr": "rate(cache_misses_total[5m])",
"legendFormat": "Cache misses/sec - {{ cache_type }}"
}
]
}
],
"time": {
"from": "now-1h",
"to": "now"
},
"refresh": "30s"
}
}
Load Testing and Performance Validation
Comprehensive Load Testing Script
#!/bin/bash
# scripts/load-test.sh
set -euo pipefail
# Configuration
TARGET_URL="${1:-http://localhost:8080}"
TEST_DURATION="${2:-300}" # 5 minutes
MAX_VUS="${3:-100}" # Virtual users
RAMP_UP_TIME="${4:-60}" # Ramp up time in seconds
TEST_TYPE="${5:-load}" # load, stress, spike, volume
# Colors for output
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
BLUE='\033[0;34m'
NC='\033[0m'
log() {
echo -e "[$(date '+%Y-%m-%d %H:%M:%S')] $1"
}
success() {
log "${GREEN}✓ $1${NC}"
}
warning() {
log "${YELLOW}⚠ $1${NC}"
}
error() {
log "${RED}✗ $1${NC}"
}
info() {
log "${BLUE}ℹ $1${NC}"
}
# Create k6 test script
create_k6_script() {
local test_type="$1"
local script_file="load-test-${test_type}.js"
cat > "${script_file}" << 'EOF'
import http from 'k6/http';
import { check, sleep } from 'k6';
import { Rate, Trend, Counter } from 'k6/metrics';
// Custom metrics
const errorRate = new Rate('error_rate');
const responseTime = new Trend('response_time');
const requestCount = new Counter('request_count');
// Test configuration
export let options = {
stages: [
{ duration: '${RAMP_UP_TIME}s', target: ${MAX_VUS} },
{ duration: '${TEST_DURATION}s', target: ${MAX_VUS} },
{ duration: '60s', target: 0 },
],
thresholds: {
http_req_duration: ['p(95)<500', 'p(99)<1000'],
http_req_failed: ['rate<0.01'],
error_rate: ['rate<0.01'],
},
};
// Test scenarios
const scenarios = {
load_test: {
executor: 'ramping-vus',
startVUs: 0,
stages: [
{ duration: '${RAMP_UP_TIME}s', target: ${MAX_VUS} },
{ duration: '${TEST_DURATION}s', target: ${MAX_VUS} },
{ duration: '60s', target: 0 },
],
},
stress_test: {
executor: 'ramping-vus',
startVUs: 0,
stages: [
{ duration: '60s', target: ${MAX_VUS} },
{ duration: '120s', target: ${MAX_VUS} * 2 },
{ duration: '180s', target: ${MAX_VUS} * 3 },
{ duration: '60s', target: 0 },
],
},
spike_test: {
executor: 'ramping-vus',
startVUs: 0,
stages: [
{ duration: '60s', target: ${MAX_VUS} },
{ duration: '10s', target: ${MAX_VUS} * 5 },
{ duration: '60s', target: ${MAX_VUS} },
{ duration: '60s', target: 0 },
],
},
};
export default function() {
const baseUrl = '${TARGET_URL}';
// Test different endpoints with realistic load distribution
const endpoints = [
{ url: `${baseUrl}/api/users`, weight: 30 },
{ url: `${baseUrl}/api/orders`, weight: 25 },
{ url: `${baseUrl}/api/products`, weight: 20 },
{ url: `${baseUrl}/api/search`, weight: 15 },
{ url: `${baseUrl}/api/analytics`, weight: 10 },
];
// Select endpoint based on weight
const random = Math.random() * 100;
let cumulative = 0;
let selectedEndpoint = endpoints[0].url;
for (const endpoint of endpoints) {
cumulative += endpoint.weight;
if (random <= cumulative) {
selectedEndpoint = endpoint.url;
break;
}
}
// Make request
const response = http.get(selectedEndpoint, {
headers: {
'User-Agent': 'k6-load-test',
'Accept': 'application/json',
},
timeout: '30s',
});
// Record metrics
requestCount.add(1);
responseTime.add(response.timings.duration);
// Validate response
const success = check(response, {
'status is 200': (r) => r.status === 200,
'response time < 1000ms': (r) => r.timings.duration < 1000,
'response has body': (r) => r.body && r.body.length > 0,
});
errorRate.add(!success);
// Realistic user behavior - think time
sleep(Math.random() * 2 + 1); // 1-3 seconds
}
export function handleSummary(data) {
return {
'load-test-results.json': JSON.stringify(data, null, 2),
'load-test-summary.txt': textSummary(data, { indent: ' ', enableColors: true }),
};
}
EOF
# Replace placeholders
sed -i.bak "s/\${TARGET_URL}/${TARGET_URL//\//\\/}/g" "${script_file}"
sed -i.bak "s/\${TEST_DURATION}/${TEST_DURATION}/g" "${script_file}"
sed -i.bak "s/\${MAX_VUS}/${MAX_VUS}/g" "${script_file}"
sed -i.bak "s/\${RAMP_UP_TIME}/${RAMP_UP_TIME}/g" "${script_file}"
rm "${script_file}.bak"
echo "${script_file}"
}
# Performance baseline measurement
measure_baseline() {
info "Measuring performance baseline..."
# Single user test
k6 run --vus 1 --duration 60s --quiet "$(create_k6_script baseline)" > baseline-results.txt
local baseline_p95=$(grep "http_req_duration.*p(95)" baseline-results.txt | awk '{print $3}' | sed 's/ms//')
local baseline_rps=$(grep "http_reqs" baseline-results.txt | awk '{print $3}' | sed 's/\/s//')
info "Baseline P95 response time: ${baseline_p95}ms"
info "Baseline RPS: ${baseline_rps}"
echo "${baseline_p95}" > baseline-p95.txt
echo "${baseline_rps}" > baseline-rps.txt
}
# Run load test
run_load_test() {
local test_type="$1"
info "Running ${test_type} test..."
info "Target: ${TARGET_URL}"
info "Duration: ${TEST_DURATION}s"
info "Max VUs: ${MAX_VUS}"
info "Ramp up: ${RAMP_UP_TIME}s"
local script_file
script_file=$(create_k6_script "${test_type}")
# Run k6 test
if k6 run "${script_file}"; then
success "${test_type} test completed successfully"
return 0
else
error "${test_type} test failed"
return 1
fi
}
# Analyze results
analyze_results() {
if [[ ! -f "load-test-results.json" ]]; then
error "Results file not found"
return 1
fi
info "Analyzing test results..."
# Extract key metrics
local p95_response_time
local p99_response_time
local avg_rps
local error_rate
local max_vus
p95_response_time=$(jq -r '.metrics.http_req_duration.values.p95' load-test-results.json)
p99_response_time=$(jq -r '.metrics.http_req_duration.values.p99' load-test-results.json)
avg_rps=$(jq -r '.metrics.http_reqs.values.rate' load-test-results.json)
error_rate=$(jq -r '.metrics.http_req_failed.values.rate' load-test-results.json)
max_vus=$(jq -r '.metrics.vus_max.values.max' load-test-results.json)
# Performance report
cat << EOF > performance-report.txt
Performance Test Report
======================
Test Type: ${TEST_TYPE}
Target URL: ${TARGET_URL}
Test Duration: ${TEST_DURATION}s
Max Virtual Users: ${max_vus}
Key Metrics:
- P95 Response Time: ${p95_response_time}ms
- P99 Response Time: ${p99_response_time}ms
- Average RPS: ${avg_rps}
- Error Rate: $(echo "${error_rate} * 100" | bc -l | cut -d. -f1)%
Performance Thresholds:
- P95 < 500ms: $(if (( $(echo "${p95_response_time} < 500" | bc -l) )); then echo "✓ PASS"; else echo "✗ FAIL"; fi)
- P99 < 1000ms: $(if (( $(echo "${p99_response_time} < 1000" | bc -l) )); then echo "✓ PASS"; else echo "✗ FAIL"; fi)
- Error Rate < 1%: $(if (( $(echo "${error_rate} < 0.01" | bc -l) )); then echo "✓ PASS"; else echo "✗ FAIL"; fi)
EOF
# Compare with baseline if available
if [[ -f "baseline-p95.txt" ]]; then
local baseline_p95
baseline_p95=$(cat baseline-p95.txt)
local degradation
degradation=$(echo "scale=2; (${p95_response_time} - ${baseline_p95}) / ${baseline_p95} * 100" | bc -l)
echo "Baseline Comparison:" >> performance-report.txt
echo "- P95 Degradation: ${degradation}%" >> performance-report.txt
if (( $(echo "${degradation} > 20" | bc -l) )); then
warning "Performance degradation detected: ${degradation}%"
fi
fi
cat performance-report.txt
}
# Generate performance graphs
generate_graphs() {
if command -v gnuplot &> /dev/null; then
info "Generating performance graphs..."
# Extract time series data
jq -r '.metrics.http_req_duration.values | to_entries[] | "\(.key) \(.value)"' load-test-results.json > response-time-data.txt
# Create gnuplot script
cat << 'EOF' > plot-performance.gp
set terminal png size 1200,800
set output 'performance-graph.png'
set title 'Load Test Performance Results'
set xlabel 'Time'
set ylabel 'Response Time (ms)'
set grid
plot 'response-time-data.txt' using 1:2 with lines title 'Response Time'
EOF
gnuplot plot-performance.gp
success "Performance graph generated: performance-graph.png"
fi
}
# Cleanup function
cleanup() {
rm -f load-test-*.js
rm -f baseline-results.txt
rm -f response-time-data.txt
rm -f plot-performance.gp
}
# Main execution
main() {
info "Starting performance testing suite"
# Check dependencies
if ! command -v k6 &> /dev/null; then
error "k6 is not installed. Please install k6 first."
exit 1
fi
# Measure baseline if not exists
if [[ ! -f "baseline-p95.txt" ]]; then
measure_baseline
fi
# Run the specified test
if run_load_test "${TEST_TYPE}"; then
analyze_results
generate_graphs
success "Performance testing completed successfully"
else
error "Performance testing failed"
exit 1
fi
}
# Set up signal handlers
trap cleanup EXIT
# Run main function
main "$@"
Conclusion
Performance optimization for cloud-native applications requires a systematic approach that encompasses:
- Comprehensive Profiling: Using tools like pprof, distributed tracing, and application metrics to identify bottlenecks
- Database Optimization: Query optimization, indexing strategies, and connection pooling
- Caching Strategies: Multi-level caching, cache warming, and intelligent cache invalidation
- Resource Optimization: Right-sizing containers, implementing auto-scaling, and optimizing resource allocation
- Continuous Monitoring: Real-time performance monitoring, alerting, and automated remediation
- Load Testing: Regular performance validation and capacity planning
Key success factors for performance optimization:
- Establish baselines and continuously monitor performance metrics
- Implement observability at all levels of the application stack
- Automate optimization where possible to reduce manual intervention
- Test regularly under realistic load conditions
- Optimize iteratively based on data-driven insights
By following these practices and implementing the solutions outlined in this guide, organizations can achieve optimal performance for their cloud-native applications while maintaining scalability and reliability.