๐งฐ Ceph Operations Cheat Sheet¶
This page provides quick reference commands and practices for monitoring, troubleshooting, and maintaining your Ceph cluster in the homelab.
๐ Monitoring Cluster Health¶
-
Cluster status:
ceph -s -
Detailed health report:
ceph health detail -
Monitor quorum status:
ceph quorum_status --format json-pretty -
OSD layout:
ceph osd tree -
Pool usage and capacity:
ceph df
๐ฆ Pool & Placement Groups¶
-
List pools:
ceph osd pool ls -
Pool details:
ceph osd pool get <pool-name> all -
Placement group statistics:
ceph pg stat
โ๏ธ OSD Management¶
-
List OSDs:
ceph osd ls -
Check OSD status:
ceph osd stat -
Restart an OSD service:
systemctl restart ceph-osd@<id> -
Mark OSD out/in:
ceph osd out <id> ceph osd in <id>
๐ Troubleshooting Common Issues¶
OSD Down / Out¶
- Restart the OSD service:
systemctl restart ceph-osd@<id> - Replace failed disk and recreate OSD if necessary.
PGs Stuck / Inactive¶
- Verify network connectivity between nodes.
- Ensure all MONs are healthy:
ceph quorum_status - Restart Ceph services if needed.
Slow Requests¶
- Check Ceph network latency:
ping <node-ip> iperf3 -c <node-ip> - Verify disk health:
smartctl -a /dev/sdb
Full or Near-Full Cluster¶
- Add more OSDs (additional disks).
- Adjust pool size or PG count.
- Clean up unused images or snapshots.
MON Quorum Loss¶
- Ensure at least 2 of 3 MONs are running.
- Restart MONs:
systemctl restart ceph-mon@<hostname>
๐งน Maintenance & Housekeeping¶
-
Check logs:
journalctl -u ceph-mon@<hostname> journalctl -u ceph-osd@<id> -
Rebalance cluster:
ceph osd reweight-by-utilization -
Scrub pools (data consistency check):
ceph osd scrub <id>
๐ Best Practices¶
- Dedicate a separate network for Ceph traffic.
- Monitor cluster health daily with
ceph -s. - Use Proxmox Backup Server alongside Ceph for snapshots and deduplication.
- Keep OSDs balanced and monitor PG counts as the cluster grows.