PostgreSQL高可用架构设计与实践:从主从复制到分布式集群
在现代企业级应用中,数据库的高可用性是系统稳定运行的基石。PostgreSQL作为功能强大的开源关系型数据库,提供了多种高可用解决方案。本文将深入探讨PostgreSQL高可用架构的设计原理和实践方案,从基础的主从复制到复杂的分布式集群部署。
PostgreSQL高可用架构概述
高可用性指标定义
在设计PostgreSQL高可用架构时,我们需要明确以下关键指标:
| 指标 | 定义 | 目标值 | 影响因素 |
|---|---|---|---|
| 可用性(Availability) | 系统正常运行时间比例 | 99.9% - 99.99% | 故障检测、切换时间 |
| 恢复时间目标(RTO) | 系统故障后恢复服务的最大时间 | < 5分钟 | 自动化程度、网络延迟 |
| 恢复点目标(RPO) | 可接受的最大数据丢失量 | < 1分钟 | 复制延迟、同步策略 |
| 故障检测时间(MTTR) | 从故障发生到检测到的时间 | < 30秒 | 监控频率、健康检查 |
高可用架构模式
graph TB
subgraph "单机模式"
A[Primary Server]
A1[定期备份]
A --> A1
end
subgraph "主从复制模式"
B[Primary Server]
B1[Standby Server 1]
B2[Standby Server 2]
B --> B1
B --> B2
end
subgraph "主从+负载均衡模式"
C[Primary Server]
C1[Standby Server 1]
C2[Standby Server 2]
C3[Load Balancer]
C3 --> C
C3 --> C1
C3 --> C2
C --> C1
C --> C2
end
subgraph "分布式集群模式"
D[Node 1]
D1[Node 2]
D2[Node 3]
D3[Coordinator]
D3 --> D
D3 --> D1
D3 --> D2
D <--> D1
D1 <--> D2
D2 <--> D
end
流复制(Streaming Replication)配置
主服务器配置
首先配置主服务器的PostgreSQL参数:
# postgresql.conf - 主服务器配置
# 基础配置
listen_addresses = '*'
port = 5432
max_connections = 200
# WAL配置
wal_level = replica
max_wal_senders = 10
max_replication_slots = 10
wal_keep_segments = 64
wal_sender_timeout = 60s
# 归档配置
archive_mode = on
archive_command = 'cp %p /var/lib/postgresql/archive/%f'
archive_timeout = 300
# 检查点配置
checkpoint_completion_target = 0.9
checkpoint_timeout = 15min
max_wal_size = 2GB
min_wal_size = 1GB
# 日志配置
log_destination = 'stderr'
logging_collector = on
log_directory = 'log'
log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log'
log_rotation_age = 1d
log_rotation_size = 100MB
log_min_duration_statement = 1000
log_checkpoints = on
log_connections = on
log_disconnections = on
log_lock_waits = on
log_statement = 'ddl'
# 性能优化
shared_buffers = 256MB
effective_cache_size = 1GB
work_mem = 4MB
maintenance_work_mem = 64MB
配置复制用户权限:
-- 创建复制用户
CREATE USER replicator WITH REPLICATION ENCRYPTED PASSWORD 'replica_password';
-- 授予必要权限
GRANT CONNECT ON DATABASE postgres TO replicator;
GRANT USAGE ON SCHEMA public TO replicator;
配置客户端认证:
# pg_hba.conf - 主服务器
# 本地连接
local all postgres peer
local all all md5
# IPv4本地连接
host all all 127.0.0.1/32 md5
host all all 10.0.0.0/8 md5
# 复制连接
host replication replicator 10.0.0.0/8 md5
host replication replicator 192.168.0.0/16 md5
从服务器配置
配置从服务器的复制参数:
# postgresql.conf - 从服务器配置
# 基础配置
listen_addresses = '*'
port = 5432
max_connections = 200
# 热备配置
hot_standby = on
hot_standby_feedback = on
max_standby_streaming_delay = 30s
max_standby_archive_delay = 30s
# WAL配置
wal_level = replica
max_wal_senders = 5
wal_receiver_status_interval = 10s
wal_receiver_timeout = 60s
# 恢复配置
restore_command = 'cp /var/lib/postgresql/archive/%f %p'
recovery_target_timeline = 'latest'
# 日志配置
log_min_duration_statement = 1000
log_standby_delays = on
创建恢复配置文件:
# recovery.conf - 从服务器
standby_mode = 'on'
primary_conninfo = 'host=10.0.1.10 port=5432 user=replicator password=replica_password application_name=standby1'
primary_slot_name = 'standby1_slot'
trigger_file = '/var/lib/postgresql/trigger_file'
recovery_target_timeline = 'latest'
复制槽管理
在主服务器上创建复制槽:
-- 创建复制槽
SELECT pg_create_physical_replication_slot('standby1_slot');
SELECT pg_create_physical_replication_slot('standby2_slot');
-- 查看复制槽状态
SELECT slot_name, slot_type, active, restart_lsn, confirmed_flush_lsn
FROM pg_replication_slots;
-- 查看复制状态
SELECT client_addr, state, sent_lsn, write_lsn, flush_lsn, replay_lsn,
write_lag, flush_lag, replay_lag
FROM pg_stat_replication;
自动故障转移实现
Patroni集群管理
使用Patroni实现PostgreSQL集群的自动故障转移:
# patroni.yml - Patroni配置文件
scope: postgres-cluster
namespace: /db/
name: postgres-node1
restapi:
listen: 0.0.0.0:8008
connect_address: 10.0.1.10:8008
etcd:
hosts: 10.0.1.20:2379,10.0.1.21:2379,10.0.1.22:2379
bootstrap:
dcs:
ttl: 30
loop_wait: 10
retry_timeout: 30
maximum_lag_on_failover: 1048576
master_start_timeout: 300
synchronous_mode: true
synchronous_mode_strict: false
synchronous_node_count: 1
postgresql:
use_pg_rewind: true
use_slots: true
parameters:
wal_level: replica
hot_standby: "on"
max_connections: 200
max_wal_senders: 10
max_replication_slots: 10
wal_keep_segments: 64
logging_collector: "on"
log_destination: stderr
log_directory: log
log_filename: postgresql-%Y-%m-%d_%H%M%S.log
log_min_duration_statement: 1000
initdb:
- encoding: UTF8
- data-checksums
pg_hba:
- host replication replicator 127.0.0.1/32 md5
- host replication replicator 10.0.1.0/24 md5
- host all all 0.0.0.0/0 md5
users:
admin:
password: admin_password
options:
- createrole
- createdb
postgresql:
listen: 0.0.0.0:5432
connect_address: 10.0.1.10:5432
data_dir: /var/lib/postgresql/13/main
bin_dir: /usr/lib/postgresql/13/bin
pgpass: /tmp/pgpass
authentication:
replication:
username: replicator
password: replica_password
superuser:
username: postgres
password: postgres_password
parameters:
unix_socket_directories: '/var/run/postgresql'
tags:
nofailover: false
noloadbalance: false
clonefrom: false
nosync: false
故障检测脚本
#!/bin/bash
# scripts/health_check.sh
set -euo pipefail
# 配置参数
POSTGRES_HOST="${POSTGRES_HOST:-localhost}"
POSTGRES_PORT="${POSTGRES_PORT:-5432}"
POSTGRES_USER="${POSTGRES_USER:-postgres}"
POSTGRES_DB="${POSTGRES_DB:-postgres}"
CHECK_INTERVAL="${CHECK_INTERVAL:-10}"
FAILURE_THRESHOLD="${FAILURE_THRESHOLD:-3}"
RECOVERY_THRESHOLD="${RECOVERY_THRESHOLD:-2}"
# 状态文件
STATE_FILE="/tmp/postgres_health_state"
FAILURE_COUNT_FILE="/tmp/postgres_failure_count"
RECOVERY_COUNT_FILE="/tmp/postgres_recovery_count"
# 日志函数
log() {
echo "[$(date '+%Y-%m-%d %H:%M:%S')] $1" >&2
}
# 健康检查函数
check_postgres_health() {
local exit_code=0
# 检查连接性
if ! pg_isready -h "${POSTGRES_HOST}" -p "${POSTGRES_PORT}" -U "${POSTGRES_USER}" -q; then
log "ERROR: PostgreSQL connection failed"
exit_code=1
fi
# 检查查询响应
if ! psql -h "${POSTGRES_HOST}" -p "${POSTGRES_PORT}" -U "${POSTGRES_USER}" -d "${POSTGRES_DB}" \
-c "SELECT 1;" -t -A -q > /dev/null 2>&1; then
log "ERROR: PostgreSQL query test failed"
exit_code=1
fi
# 检查复制延迟
local replication_lag
replication_lag=$(psql -h "${POSTGRES_HOST}" -p "${POSTGRES_PORT}" -U "${POSTGRES_USER}" -d "${POSTGRES_DB}" \
-c "SELECT CASE WHEN pg_is_in_recovery() THEN
EXTRACT(EPOCH FROM (now() - pg_last_xact_replay_timestamp()))::int
ELSE 0 END;" -t -A -q 2>/dev/null || echo "999")
if [[ "${replication_lag}" -gt 60 ]]; then
log "WARNING: High replication lag: ${replication_lag}s"
exit_code=1
fi
# 检查磁盘空间
local disk_usage
disk_usage=$(df /var/lib/postgresql | tail -1 | awk '{print $5}' | sed 's/%//')
if [[ "${disk_usage}" -gt 90 ]]; then
log "ERROR: High disk usage: ${disk_usage}%"
exit_code=1
fi
return ${exit_code}
}
# 故障处理函数
handle_failure() {
local failure_count
failure_count=$(cat "${FAILURE_COUNT_FILE}" 2>/dev/null || echo "0")
failure_count=$((failure_count + 1))
echo "${failure_count}" > "${FAILURE_COUNT_FILE}"
log "Health check failed (${failure_count}/${FAILURE_THRESHOLD})"
if [[ "${failure_count}" -ge "${FAILURE_THRESHOLD}" ]]; then
log "CRITICAL: Failure threshold reached, triggering failover"
echo "FAILED" > "${STATE_FILE}"
# 触发故障转移
trigger_failover
fi
}
# 恢复处理函数
handle_recovery() {
local recovery_count
recovery_count=$(cat "${RECOVERY_COUNT_FILE}" 2>/dev/null || echo "0")
recovery_count=$((recovery_count + 1))
echo "${recovery_count}" > "${RECOVERY_COUNT_FILE}"
log "Health check passed (${recovery_count}/${RECOVERY_THRESHOLD})"
if [[ "${recovery_count}" -ge "${RECOVERY_THRESHOLD}" ]]; then
log "INFO: Recovery threshold reached, marking as healthy"
echo "HEALTHY" > "${STATE_FILE}"
echo "0" > "${FAILURE_COUNT_FILE}"
echo "0" > "${RECOVERY_COUNT_FILE}"
fi
}
# 故障转移触发函数
trigger_failover() {
log "Triggering PostgreSQL failover..."
# 使用Patroni API触发故障转移
if command -v curl &> /dev/null; then
curl -s -X POST http://localhost:8008/failover \
-H "Content-Type: application/json" \
-d '{"leader": "postgres-node1", "candidate": "postgres-node2"}' || true
fi
# 发送告警通知
send_alert "PostgreSQL failover triggered on $(hostname)"
}
# 告警通知函数
send_alert() {
local message="$1"
# 发送到Slack
if [[ -n "${SLACK_WEBHOOK_URL:-}" ]]; then
curl -X POST "${SLACK_WEBHOOK_URL}" \
-H 'Content-type: application/json' \
-d "{\"text\":\"🚨 PostgreSQL Alert: ${message}\"}" || true
fi
# 发送邮件
if command -v mail &> /dev/null; then
echo "${message}" | mail -s "PostgreSQL Alert" "${ALERT_EMAIL:-admin@company.com}" || true
fi
log "Alert sent: ${message}"
}
# 主循环
main() {
log "Starting PostgreSQL health monitoring"
while true; do
if check_postgres_health; then
local current_state
current_state=$(cat "${STATE_FILE}" 2>/dev/null || echo "UNKNOWN")
if [[ "${current_state}" != "HEALTHY" ]]; then
handle_recovery
else
# 重置计数器
echo "0" > "${FAILURE_COUNT_FILE}"
echo "0" > "${RECOVERY_COUNT_FILE}"
fi
else
handle_failure
fi
sleep "${CHECK_INTERVAL}"
done
}
# 信号处理
cleanup() {
log "Health monitor shutting down"
exit 0
}
trap cleanup SIGTERM SIGINT
# 启动监控
main "$@"
负载均衡配置
HAProxy配置
# /etc/haproxy/haproxy.cfg
global
daemon
maxconn 4096
log stdout local0
stats socket /var/run/haproxy.sock mode 600 level admin
stats timeout 2m
defaults
mode tcp
timeout connect 5s
timeout client 30s
timeout server 30s
option tcplog
log global
# PostgreSQL主库负载均衡
frontend postgres_primary
bind *:5432
default_backend postgres_primary_servers
backend postgres_primary_servers
balance roundrobin
option tcp-check
tcp-check connect
tcp-check send-binary 00000020 # PostgreSQL startup packet length
tcp-check send-binary 00030000 # Protocol version 3.0
tcp-check send-binary 7573657200 # "user\0"
tcp-check send-binary 68656c6c6f00 # "hello\0"
tcp-check send-binary 00 # End of startup packet
tcp-check expect binary 52 # Authentication request
server postgres-primary 10.0.1.10:5432 check inter 5s rise 2 fall 3
server postgres-standby1 10.0.1.11:5432 check inter 5s rise 2 fall 3 backup
server postgres-standby2 10.0.1.12:5432 check inter 5s rise 2 fall 3 backup
# PostgreSQL只读负载均衡
frontend postgres_readonly
bind *:5433
default_backend postgres_readonly_servers
backend postgres_readonly_servers
balance roundrobin
option tcp-check
tcp-check connect
tcp-check send-binary 00000020
tcp-check send-binary 00030000
tcp-check send-binary 7573657200
tcp-check send-binary 68656c6c6f00
tcp-check send-binary 00
tcp-check expect binary 52
server postgres-standby1 10.0.1.11:5432 check inter 5s rise 2 fall 3
server postgres-standby2 10.0.1.12:5432 check inter 5s rise 2 fall 3
server postgres-primary 10.0.1.10:5432 check inter 5s rise 2 fall 3 backup
# 统计页面
frontend stats
bind *:8404
stats enable
stats uri /stats
stats refresh 30s
stats admin if TRUE
PgBouncer连接池
# /etc/pgbouncer/pgbouncer.ini
[databases]
production = host=10.0.1.10 port=5432 dbname=production
production_ro = host=10.0.1.11 port=5432 dbname=production
[pgbouncer]
listen_addr = *
listen_port = 6432
auth_type = md5
auth_file = /etc/pgbouncer/userlist.txt
admin_users = pgbouncer_admin
stats_users = pgbouncer_stats
# 连接池配置
pool_mode = transaction
max_client_conn = 1000
default_pool_size = 25
min_pool_size = 5
reserve_pool_size = 5
reserve_pool_timeout = 5
# 服务器配置
server_reset_query = DISCARD ALL
server_check_query = SELECT 1
server_check_delay = 30
server_connect_timeout = 15
server_login_retry = 15
# 客户端配置
client_login_timeout = 60
autodb_idle_timeout = 3600
dns_max_ttl = 15
dns_nxdomain_ttl = 15
# 日志配置
log_connections = 1
log_disconnections = 1
log_pooler_errors = 1
syslog = 1
syslog_facility = daemon
syslog_ident = pgbouncer
监控和告警
Prometheus监控配置
# monitoring/postgres-exporter.yml
apiVersion: apps/v1
kind: Deployment
metadata:
name: postgres-exporter
namespace: monitoring
spec:
replicas: 1
selector:
matchLabels:
app: postgres-exporter
template:
metadata:
labels:
app: postgres-exporter
spec:
containers:
- name: postgres-exporter
image: prometheuscommunity/postgres-exporter:latest
ports:
- containerPort: 9187
name: metrics
env:
- name: DATA_SOURCE_NAME
value: "postgresql://postgres_exporter:password@postgres-primary:5432/postgres?sslmode=disable"
- name: PG_EXPORTER_EXTEND_QUERY_PATH
value: "/etc/postgres_exporter/queries.yaml"
volumeMounts:
- name: queries
mountPath: /etc/postgres_exporter
resources:
requests:
memory: "64Mi"
cpu: "50m"
limits:
memory: "128Mi"
cpu: "100m"
volumes:
- name: queries
configMap:
name: postgres-exporter-queries
---
apiVersion: v1
kind: ConfigMap
metadata:
name: postgres-exporter-queries
namespace: monitoring
data:
queries.yaml: |
pg_replication:
query: "SELECT client_addr, state, sent_lsn, write_lsn, flush_lsn, replay_lsn, write_lag, flush_lag, replay_lag FROM pg_stat_replication"
master: true
metrics:
- client_addr:
usage: "LABEL"
description: "Client address"
- state:
usage: "LABEL"
description: "Replication state"
- write_lag:
usage: "GAUGE"
description: "Write lag in seconds"
- flush_lag:
usage: "GAUGE"
description: "Flush lag in seconds"
- replay_lag:
usage: "GAUGE"
description: "Replay lag in seconds"
pg_database_size:
query: "SELECT datname, pg_database_size(datname) as size_bytes FROM pg_database WHERE datistemplate = false"
master: true
metrics:
- datname:
usage: "LABEL"
description: "Database name"
- size_bytes:
usage: "GAUGE"
description: "Database size in bytes"
pg_slow_queries:
query: "SELECT query, calls, total_time, mean_time FROM pg_stat_statements WHERE mean_time > 1000 ORDER BY mean_time DESC LIMIT 10"
master: true
metrics:
- query:
usage: "LABEL"
description: "SQL query"
- calls:
usage: "COUNTER"
description: "Number of calls"
- total_time:
usage: "COUNTER"
description: "Total execution time"
- mean_time:
usage: "GAUGE"
description: "Mean execution time"
告警规则配置
# monitoring/postgres-alerts.yml
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: postgres-alerts
namespace: monitoring
spec:
groups:
- name: postgres.rules
rules:
# 数据库连接告警
- alert: PostgreSQLDown
expr: pg_up == 0
for: 1m
labels:
severity: critical
service: postgresql
annotations:
summary: "PostgreSQL instance is down"
description: "PostgreSQL instance {{ $labels.instance }} has been down for more than 1 minute"
# 复制延迟告警
- alert: PostgreSQLReplicationLag
expr: pg_replication_lag > 60
for: 2m
labels:
severity: warning
service: postgresql
annotations:
summary: "PostgreSQL replication lag is high"
description: "PostgreSQL replication lag is {{ $value }}s on {{ $labels.instance }}"
# 连接数告警
- alert: PostgreSQLTooManyConnections
expr: pg_stat_database_numbackends / pg_settings_max_connections > 0.8
for: 5m
labels:
severity: warning
service: postgresql
annotations:
summary: "PostgreSQL has too many connections"
description: "PostgreSQL instance {{ $labels.instance }} has {{ $value | humanizePercentage }} connections used"
# 磁盘空间告警
- alert: PostgreSQLDiskSpaceHigh
expr: (node_filesystem_size_bytes{mountpoint="/var/lib/postgresql"} - node_filesystem_free_bytes{mountpoint="/var/lib/postgresql"}) / node_filesystem_size_bytes{mountpoint="/var/lib/postgresql"} > 0.9
for: 5m
labels:
severity: critical
service: postgresql
annotations:
summary: "PostgreSQL disk space is running low"
description: "PostgreSQL disk usage is {{ $value | humanizePercentage }} on {{ $labels.instance }}"
# 慢查询告警
- alert: PostgreSQLSlowQueries
expr: rate(pg_stat_statements_total_time[5m]) / rate(pg_stat_statements_calls[5m]) > 1000
for: 5m
labels:
severity: warning
service: postgresql
annotations:
summary: "PostgreSQL has slow queries"
description: "PostgreSQL average query time is {{ $value }}ms on {{ $labels.instance }}"
性能优化和调优
数据库参数优化
-- 性能优化配置
-- 内存配置
ALTER SYSTEM SET shared_buffers = '256MB';
ALTER SYSTEM SET effective_cache_size = '1GB';
ALTER SYSTEM SET work_mem = '4MB';
ALTER SYSTEM SET maintenance_work_mem = '64MB';
-- 检查点配置
ALTER SYSTEM SET checkpoint_completion_target = 0.9;
ALTER SYSTEM SET checkpoint_timeout = '15min';
ALTER SYSTEM SET max_wal_size = '2GB';
ALTER SYSTEM SET min_wal_size = '1GB';
-- 并发配置
ALTER SYSTEM SET max_connections = 200;
ALTER SYSTEM SET max_worker_processes = 8;
ALTER SYSTEM SET max_parallel_workers = 8;
ALTER SYSTEM SET max_parallel_workers_per_gather = 4;
-- 查询优化配置
ALTER SYSTEM SET random_page_cost = 1.1;
ALTER SYSTEM SET effective_io_concurrency = 200;
ALTER SYSTEM SET default_statistics_target = 100;
-- 应用配置
SELECT pg_reload_conf();
索引优化策略
-- 创建性能监控视图
CREATE OR REPLACE VIEW pg_index_usage AS
SELECT
schemaname,
tablename,
indexname,
idx_tup_read,
idx_tup_fetch,
idx_scan,
CASE
WHEN idx_scan = 0 THEN 'Unused'
WHEN idx_scan < 10 THEN 'Rarely Used'
WHEN idx_scan < 100 THEN 'Moderately Used'
ELSE 'Frequently Used'
END as usage_category
FROM pg_stat_user_indexes
ORDER BY idx_scan DESC;
-- 查找未使用的索引
SELECT
schemaname,
tablename,
indexname,
pg_size_pretty(pg_relation_size(indexrelid)) as index_size
FROM pg_stat_user_indexes
WHERE idx_scan = 0
AND schemaname NOT IN ('information_schema', 'pg_catalog')
ORDER BY pg_relation_size(indexrelid) DESC;
-- 查找重复索引
WITH index_columns AS (
SELECT
i.indexrelid,
i.indrelid,
i.indkey,
array_to_string(array_agg(a.attname ORDER BY a.attnum), ',') as columns
FROM pg_index i
JOIN pg_attribute a ON a.attrelid = i.indrelid AND a.attnum = ANY(i.indkey)
WHERE i.indisvalid
GROUP BY i.indexrelid, i.indrelid, i.indkey
)
SELECT
n.nspname as schema_name,
c.relname as table_name,
ci1.relname as index1,
ci2.relname as index2,
ic.columns
FROM index_columns ic
JOIN pg_class c ON c.oid = ic.indrelid
JOIN pg_namespace n ON n.oid = c.relnamespace
JOIN pg_class ci1 ON ci1.oid = ic.indexrelid
JOIN index_columns ic2 ON ic2.indrelid = ic.indrelid
AND ic2.columns = ic.columns
AND ic2.indexrelid > ic.indexrelid
JOIN pg_class ci2 ON ci2.oid = ic2.indexrelid
WHERE n.nspname NOT IN ('information_schema', 'pg_catalog')
ORDER BY n.nspname, c.relname;
总结
PostgreSQL高可用架构的实现需要综合考虑多个方面:
- 复制策略:选择合适的复制模式(异步、同步、半同步)
- 故障检测:实现快速、准确的故障检测机制
- 自动切换:使用Patroni等工具实现自动故障转移
- 负载均衡:合理分配读写请求,提高系统性能
- 监控告警:建立完善的监控体系,及时发现和处理问题
- 性能优化:持续优化数据库参数和查询性能
通过本文介绍的方案,可以构建一个稳定、高效的PostgreSQL高可用系统,满足企业级应用的需求。在实际部署时,需要根据具体的业务场景和性能要求进行适当调整。