Remove OSD from Storage Pool
Scenario: One of the nodes has failed to power up with no reason. The OSDs that host by the failed node went offline, so we have to recover the storage pool ASAP from the health_warn
status. We could do it from any host of the cluster.
Connect to the Console
ssh [email protected]
Warning: Permanently added '192.168.1.x' (ECDSA) to the list of known hosts.
Password:
Check Storage Status
As shown below, we have 2 OSDs down due to a failed hard disk, OSD No.56 and No.58 on node hostname (s3).
The health status is showing HEALTH_WARN
.
s1:storage> status
cluster:
id: c6e64c49-09cf-463b-9d1c-b6645b4b3b85
health: HEALTH_WARN
2 osds down
Degraded data redundancy: 1611/44204 objects degraded (3.644%), 106 pgs degraded
services:
mon: 3 daemons, quorum s1,s2,s3
mgr: s1(active), standbys: s2, s3
mds: cephfs-1/1/1 up {0=s3=up:active}, 2 up:standby
osd: 60 osds: 58 up, 60 in
rgw: 3 daemons active
data:
pools: 22 pools, 5488 pgs
objects: 21.59k objects, 82.8GiB
usage: 232GiB used, 92.3TiB / 92.6TiB avail
pgs: 1611/44204 objects degraded (3.644%)
4873 active+clean
509 active+undersized
106 active+undersized+degraded
io:
client: 0B/s rd, 1.88MiB/s wr, 2.74kop/s rd, 64op/s wr
recovery: 5B/s, 0objects/s
cache: 0op/s promote
+----+------+-------+-------+--------+---------+--------+---------+-----------+
| id | host | used | avail | wr ops | wr data | rd ops | rd data | state |
+----+------+-------+-------+--------+---------+--------+---------+-----------+
| 0 | s1 | 1597M | 445G | 0 | 0 | 0 | 4096 | exists,up |
| 1 | s1 | 1543M | 445G | 0 | 819 | 0 | 1638 | exists,up |
~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~
| 54 | s3 | 4562M | 1858G | 2 | 64.0k | 8 | 32.8k | exists,up |
| 55 | s2 | 3784M | 1859G | 6 | 119k | 429 | 1717k | exists,up |
| 56 | s3 | 3552M | 1859G | 0 | 0 | 0 | 0 | exists |
| 57 | s2 | 5285M | 1857G | 3 | 49.6k | 12 | 76.8k | exists,up |
| 58 | s3 | 4921M | 1858G | 0 | 0 | 0 | 0 | exists |
| 59 | s2 | 3865M | 1859G | 1 | 17.6k | 2 | 9011 | exists,up |
+----+------+-------+-------+--------+---------+--------+---------+-----------+
Remove OSDs
CLI: storage> remove_osd
Remove all failed OSDs from the list.
s1> storage
s1:storage> remove_osd
Enter osd id to be removed:
1: osd.56 (hdd)
2: osd.58 (hdd)
Enter index: 1
Enter 'YES' to confirm: YES
Remove osd.58 successfully.
After all failed OSDs are removed, please check the storage health with CLI storage> status
The health status should be showing HEALTH_OK
.
s1:storage> status
cluster:
id: c6e64c49-09cf-463b-9d1c-b6645b4b3b85
health: HEALTH_OK
services:
mon: 3 daemons, quorum s1,s2,s3
mgr: s1(active), standbys: s2, s3
mds: cephfs-1/1/1 up {0=s3=up:active}, 2 up:standby
osd: 58 osds: 58 up, 58 in
rgw: 3 daemons active
data:
pools: 22 pools, 5488 pgs
objects: 21.59k objects, 82.8GiB
usage: 229GiB used, 88.7TiB / 88.9TiB avail
pgs: 5488 active+clean
io:
client: 132KiB/s rd, 5.44KiB/s wr, 159op/s rd, 0op/s wr
+----+------+-------+-------+--------+---------+--------+---------+-----------+
| id | host | used | avail | wr ops | wr data | rd ops | rd data | state |
+----+------+-------+-------+--------+---------+--------+---------+-----------+
| 0 | s1 | 1594M | 445G | 0 | 0 | 0 | 0 | exists,up |
| 1 | s1 | 1536M | 445G | 0 | 0 | 0 | 0 | exists,up |
~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~
| 54 | s3 | 4665M | 1858G | 3 | 29.6k | 0 | 0 | exists,up |
| 55 | s2 | 3769M | 1859G | 0 | 0 | 0 | 0 | exists,up |
| 57 | s2 | 5366M | 1857G | 0 | 819 | 0 | 0 | exists,up |
| 59 | s2 | 3851M | 1859G | 0 | 0 | 0 | 0 | exists,up |
+----+------+-------+-------+--------+---------+--------+---------+-----------+