Project

General

Profile

๐Ÿงฐ Ceph Operations Cheat Sheet

This page provides quick reference commands and practices for monitoring, troubleshooting, and maintaining your Ceph cluster in the homelab.


๐Ÿ” Monitoring Cluster Health

  • Cluster status:
    ceph -s
    
  • Detailed health report:
    ceph health detail
    
  • Monitor quorum status:
    ceph quorum_status --format json-pretty
    
  • OSD layout:
    ceph osd tree
    
  • Pool usage and capacity:
    ceph df
    

๐Ÿ“ฆ Pool & Placement Groups

  • List pools:
    ceph osd pool ls
    
  • Pool details:
    ceph osd pool get <pool-name> all
    
  • Placement group statistics:
    ceph pg stat
    

โš™๏ธ OSD Management

  • List OSDs:
    ceph osd ls
    
  • Check OSD status:
    ceph osd stat
    
  • Restart an OSD service:
    systemctl restart ceph-osd@<id>
    
  • Mark OSD out/in:
    ceph osd out <id>
    ceph osd in <id>
    

๐Ÿ›  Troubleshooting Common Issues

OSD Down / Out

  • Restart the OSD service:
    systemctl restart ceph-osd@<id>
    
  • Replace failed disk and recreate OSD if necessary.

PGs Stuck / Inactive

  • Verify network connectivity between nodes.
  • Ensure all MONs are healthy:
    ceph quorum_status
    
  • Restart Ceph services if needed.

Slow Requests

  • Check Ceph network latency:
    ping <node-ip>
    iperf3 -c <node-ip>
    
  • Verify disk health:
    smartctl -a /dev/sdb
    

Full or Near-Full Cluster

  • Add more OSDs (additional disks).
  • Adjust pool size or PG count.
  • Clean up unused images or snapshots.

MON Quorum Loss

  • Ensure at least 2 of 3 MONs are running.
  • Restart MONs:
    systemctl restart ceph-mon@<hostname>
    

๐Ÿงน Maintenance & Housekeeping

  • Check logs:
    journalctl -u ceph-mon@<hostname>
    journalctl -u ceph-osd@<id>
    
  • Rebalance cluster:
    ceph osd reweight-by-utilization
    
  • Scrub pools (data consistency check):
    ceph osd scrub <id>
    

๐Ÿš€ Best Practices

  • Dedicate a separate network for Ceph traffic.
  • Monitor cluster health daily with ceph -s.
  • Use Proxmox Backup Server alongside Ceph for snapshots and deduplication.
  • Keep OSDs balanced and monitor PG counts as the cluster grows.