🔧 Replace failed disk and recreate the OSD¶
Follow this sequence to safely remove a failed OSD, replace the physical disk, and recreate the OSD in your Ceph cluster. Commands assume Proxmox with Ceph and use the same conventions from your setup.
1) Identify the failed OSD and device¶
-
Check cluster health and OSD state:
ceph -s ceph osd tree ceph osd stat -
Map OSD ID to device and host:
ceph osd find <osd-id> # On the host (e.g., pve-1), list disks: lsblk -o NAME,SIZE,MODEL,TYPE,MOUNTPOINT
Note the hostname and the device path (e.g.,
/dev/sdb) associated with the failed OSD.
2) Mark the OSD out and stop its service¶
-
Mark OSD out so data is re-replicated elsewhere:
ceph osd out <osd-id> -
Stop the OSD daemon on the node hosting it:
systemctl stop ceph-osd@<osd-id>
Wait for the cluster to begin recovering:
ceph -sshould show recovering/backfilling PGs.
3) Remove the OSD from the cluster¶
-
Remove CRUSH and auth entries:
ceph osd crush remove osd.<osd-id> ceph auth del osd.<osd-id> ceph osd rm <osd-id> - Optionally verify the OSD is gone:
ceph osd tree ceph -s
4) Replace the physical disk¶
- Power down the node if needed and replace the failed disk.
- After boot, verify the new disk appears (e.g.,
/dev/sdb):lsblk -o NAME,SIZE,MODEL,TYPE dmesg | tail
5) Wipe and prepare the new disk¶
If the new disk contains old metadata or partitions, wipe it fully:
# Zap partitions and signatures (CAUTION: destructive)
sgdisk --zap-all /dev/sdb
wipefs -a /dev/sdb
partprobe /dev/sdb
6) Create the new OSD on the replacement disk¶
You’ve been using Proxmox’s pveceph tooling. Create the OSD with the same method:
# On the node hosting the new disk
pveceph osd create /dev/sdb
This handles preparing the disk (ceph-volume), creating the OSD ID, keyrings, and registering it in CRUSH.
7) Verify OSD daemon and CRUSH placement¶
-
Check service state:
systemctl status ceph-osd@<new-id> -
Verify it appears in the cluster:
ceph osd tree ceph osd stat ceph -s
The new OSD should show as “up/in”. If it’s “up/out”, run
ceph osd in <new-id>.
8) Reweight and allow rebalancing¶
- Optionally reweight by utilization to balance data:
ceph osd reweight-by-utilization - Monitor backfill/recovery:
ceph -s ceph health detail
9) Post-replacement checks¶
-
Disk health (SMART):
smartctl -a /dev/sdb -
Pool capacity and PGs:
ceph df ceph pg stat -
Logs on the node:
journalctl -u ceph-osd@<new-id> -f
Notes and tips¶
- If
pveceph osd createfails, check for lingering partitions or LVM on the device; ensure it’s fully wiped before retrying. - Keep the Ceph network healthy; OSD replacement triggers heavy traffic during rebalancing.
- For consistent performance, match SSD models and sizes across nodes when possible.
- If you used dedicated DB/WAL devices previously, recreate the OSD with the same layout (pveceph supports passing devices for advanced setups).
Direct answer: mark the failed OSD out, stop and remove it from Ceph, replace the disk, wipe it, recreate the OSD with pveceph osd create /dev/sdb, verify it’s up/in, and monitor rebalancing until health is OK.