🗂 Home Lab¶

🏠 Overview

- Home Lab Overview – Purpose, design, and lab scope.

🖥 Virtualization & Infrastructure

- Manage VMs, Docker containers, and automated provisioning using Proxmox and Cloud-Init.

🔐 Identity & Authentication

- Centralized authentication and SSO using LDAP, Microsoft Entra ID, and OAuth2.

💾 Storage & Backup

- Manage distributed, networked, and file storage plus VM backups using Ceph, iSCSI, NFS, and Proxmox Backup Server.

⚙️ Automation & Orchestration

- Automate provisioning, deployment, and CI/CD workflows using Ansible, Semaphore, Terraform, Jenkins, and GitHub Actions.

🌐 Networking & Services

- Manage Reverse-Proxy, Pi-hole(DNS), and Certificates

📘 Project Management

- Organize issues, track projects, and host documentation using Redmine Configuration.

💻 Infrastructure-as-Code

- Automate provisioning, configuration, and deployment using Ansible, Terraform, and GitHub workflows.

🎯 Learning & Experimentation

- Sandbox for testing infrastructure, DevOps, and security workflows.

Future Expansions

- Track metrics, monitoring, and experimental tools such as Prometheus, Grafana, and other proof-of-concepts.

Wiki

🧰 Ceph Operations Cheat Sheet¶

This page provides quick reference commands and practices for monitoring, troubleshooting, and maintaining your Ceph cluster in the homelab.

🔍 Monitoring Cluster Health¶

Cluster status:
```
ceph -s
```
Detailed health report:
```
ceph health detail
```

Monitor quorum status:

ceph quorum_status --format json-pretty

OSD layout:
```
ceph osd tree
```
Pool usage and capacity:
```
ceph df
```

📦 Pool & Placement Groups¶

List pools:
```
ceph osd pool ls
```
Pool details:
```
ceph osd pool get <pool-name> all
```
Placement group statistics:
```
ceph pg stat
```

⚙️ OSD Management¶

List OSDs:
```
ceph osd ls
```
Check OSD status:
```
ceph osd stat
```
Restart an OSD service:
```
systemctl restart ceph-osd@<id>
```
Mark OSD out/in:
```
ceph osd out <id>
ceph osd in <id>
```

🛠 Troubleshooting Common Issues¶

OSD Down / Out¶

Restart the OSD service:
```
systemctl restart ceph-osd@<id>
```
Replace failed disk and recreate OSD if necessary.

PGs Stuck / Inactive¶

Verify network connectivity between nodes.
Ensure all MONs are healthy:
```
ceph quorum_status
```
Restart Ceph services if needed.

Slow Requests¶

Check Ceph network latency:
```
ping <node-ip>
iperf3 -c <node-ip>
```
Verify disk health:
```
smartctl -a /dev/sdb
```

Full or Near-Full Cluster¶

Add more OSDs (additional disks).
Adjust pool size or PG count.
Clean up unused images or snapshots.

MON Quorum Loss¶

Ensure at least 2 of 3 MONs are running.
Restart MONs:
```
systemctl restart ceph-mon@<hostname>
```

🧹 Maintenance & Housekeeping¶

Check logs:

journalctl -u ceph-mon@<hostname>
journalctl -u ceph-osd@<id>

Rebalance cluster:
```
ceph osd reweight-by-utilization
```
Scrub pools (data consistency check):
```
ceph osd scrub <id>
```

🚀 Best Practices¶

Dedicate a separate network for Ceph traffic.
Monitor cluster health daily with ceph -s.
Use Proxmox Backup Server alongside Ceph for snapshots and deduplication.
Keep OSDs balanced and monitor PG counts as the cluster grows.

Files (0)

Project

General

Profile

Home Lab