remove osd from storage pool
health_warn
status. you can do it from any host of your cluster.#
Story: One of your node has failed to power up with no reason. the osds that host by the failed node went offline. So we have to recover the storage pool ASAP from the - connect to one of your (live) host
$ ssh [email protected]Warning: Permanently added '192.168.1.x' (ECDSA) to the list of known hosts.Password:
- checking storage status before start
- as shown below, We has 2 osds down due to a failed hard disk, osd no.56,58 on node hostname (s3)
- the health status is showing
HEALTH_WARN
s1:storage> status cluster: id: c6e64c49-09cf-463b-9d1c-b6645b4b3b85 health: HEALTH_WARN 2 osds down Degraded data redundancy: 1611/44204 objects degraded (3.644%), 106 pgs degraded
services: mon: 3 daemons, quorum s1,s2,s3 mgr: s1(active), standbys: s2, s3 mds: cephfs-1/1/1 up {0=s3=up:active}, 2 up:standby osd: 60 osds: 58 up, 60 in rgw: 3 daemons active
data: pools: 22 pools, 5488 pgs objects: 21.59k objects, 82.8GiB usage: 232GiB used, 92.3TiB / 92.6TiB avail pgs: 1611/44204 objects degraded (3.644%) 4873 active+clean 509 active+undersized 106 active+undersized+degraded
io: client: 0B/s rd, 1.88MiB/s wr, 2.74kop/s rd, 64op/s wr recovery: 5B/s, 0objects/s cache: 0op/s promote
+----+------+-------+-------+--------+---------+--------+---------+-----------+| id | host | used | avail | wr ops | wr data | rd ops | rd data | state |+----+------+-------+-------+--------+---------+--------+---------+-----------+| 0 | s1 | 1597M | 445G | 0 | 0 | 0 | 4096 | exists,up || 1 | s1 | 1543M | 445G | 0 | 819 | 0 | 1638 | exists,up |~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ | 54 | s3 | 4562M | 1858G | 2 | 64.0k | 8 | 32.8k | exists,up || 55 | s2 | 3784M | 1859G | 6 | 119k | 429 | 1717k | exists,up || 56 | s3 | 3552M | 1859G | 0 | 0 | 0 | 0 | exists || 57 | s2 | 5285M | 1857G | 3 | 49.6k | 12 | 76.8k | exists,up || 58 | s3 | 4921M | 1858G | 0 | 0 | 0 | 0 | exists || 59 | s2 | 3865M | 1859G | 1 | 17.6k | 2 | 9011 | exists,up |+----+------+-------+-------+--------+---------+--------+---------+-----------+
- start remove osds with CLI
storage> remove_osd
and remove all failed osds from the list s1> storage s1:storage> remove_osd Enter osd id to be removed: 1: osd.56 (hdd) 2: osd.58 (hdd) Enter index: 1 Enter 'YES' to confirm: YES Remove osd.58 successfully.
- after all failed osds is removed, please check your storage health with CLI
storage> status
- the health status is showing
HEALTH_OK
s1:storage> status cluster: id: c6e64c49-09cf-463b-9d1c-b6645b4b3b85 health: HEALTH_OK
services: mon: 3 daemons, quorum s1,s2,s3 mgr: s1(active), standbys: s2, s3 mds: cephfs-1/1/1 up {0=s3=up:active}, 2 up:standby osd: 58 osds: 58 up, 58 in rgw: 3 daemons active
data: pools: 22 pools, 5488 pgs objects: 21.59k objects, 82.8GiB usage: 229GiB used, 88.7TiB / 88.9TiB avail pgs: 5488 active+clean
io: client: 132KiB/s rd, 5.44KiB/s wr, 159op/s rd, 0op/s wr+----+------+-------+-------+--------+---------+--------+---------+-----------+| id | host | used | avail | wr ops | wr data | rd ops | rd data | state |+----+------+-------+-------+--------+---------+--------+---------+-----------+| 0 | s1 | 1594M | 445G | 0 | 0 | 0 | 0 | exists,up || 1 | s1 | 1536M | 445G | 0 | 0 | 0 | 0 | exists,up |~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ | 54 | s3 | 4665M | 1858G | 3 | 29.6k | 0 | 0 | exists,up || 55 | s2 | 3769M | 1859G | 0 | 0 | 0 | 0 | exists,up || 57 | s2 | 5366M | 1857G | 0 | 819 | 0 | 0 | exists,up || 59 | s2 | 3851M | 1859G | 0 | 0 | 0 | 0 | exists,up |+----+------+-------+-------+--------+---------+--------+---------+-----------+