Version: 2.0

remove osd from storage pool

Story: One of your node has failed to power up with no reason. the osds that host by the failed node went offline. So we have to recover the storage pool ASAP from the `health_warn` status. you can do it from any host of your cluster.#

connect to one of your (live) host

$ ssh [email protected]Warning: Permanently added '192.168.1.x' (ECDSA) to the list of known hosts.Password:

checking storage status before start
as shown below, We has 2 osds down due to a failed hard disk, osd no.56,58 on node hostname (s3)
the health status is showing HEALTH_WARN

s1:storage> status  cluster:    id:     c6e64c49-09cf-463b-9d1c-b6645b4b3b85    health: HEALTH_WARN            2 osds down            Degraded data redundancy: 1611/44204 objects degraded (3.644%), 106 pgs degraded
  services:    mon: 3 daemons, quorum s1,s2,s3    mgr: s1(active), standbys: s2, s3    mds: cephfs-1/1/1 up  {0=s3=up:active}, 2 up:standby    osd: 60 osds: 58 up, 60 in    rgw: 3 daemons active
  data:    pools:   22 pools, 5488 pgs    objects: 21.59k objects, 82.8GiB    usage:   232GiB used, 92.3TiB / 92.6TiB avail    pgs:     1611/44204 objects degraded (3.644%)             4873 active+clean             509  active+undersized             106  active+undersized+degraded
  io:    client:   0B/s rd, 1.88MiB/s wr, 2.74kop/s rd, 64op/s wr    recovery: 5B/s, 0objects/s    cache:    0op/s promote
+----+------+-------+-------+--------+---------+--------+---------+-----------+| id | host |  used | avail | wr ops | wr data | rd ops | rd data |   state   |+----+------+-------+-------+--------+---------+--------+---------+-----------+| 0  |  s1  | 1597M |  445G |    0   |     0   |    0   |  4096   | exists,up || 1  |  s1  | 1543M |  445G |    0   |   819   |    0   |  1638   | exists,up |~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ | 54 |  s3  | 4562M | 1858G |    2   |  64.0k  |    8   |  32.8k  | exists,up || 55 |  s2  | 3784M | 1859G |    6   |   119k  |  429   |  1717k  | exists,up || 56 |  s3  | 3552M | 1859G |    0   |     0   |    0   |     0   |   exists  || 57 |  s2  | 5285M | 1857G |    3   |  49.6k  |   12   |  76.8k  | exists,up || 58 |  s3  | 4921M | 1858G |    0   |     0   |    0   |     0   |   exists  || 59 |  s2  | 3865M | 1859G |    1   |  17.6k  |    2   |  9011   | exists,up |+----+------+-------+-------+--------+---------+--------+---------+-----------+

start remove osds with CLI storage> remove_osd and remove all failed osds from the list s1> storage s1:storage> remove_osd Enter osd id to be removed: 1: osd.56 (hdd) 2: osd.58 (hdd) Enter index: 1 Enter 'YES' to confirm: YES Remove osd.58 successfully.

after all failed osds is removed, please check your storage health with CLI storage> status
the health status is showing HEALTH_OK

s1:storage> status  cluster:    id:     c6e64c49-09cf-463b-9d1c-b6645b4b3b85    health: HEALTH_OK
  services:    mon: 3 daemons, quorum s1,s2,s3    mgr: s1(active), standbys: s2, s3    mds: cephfs-1/1/1 up  {0=s3=up:active}, 2 up:standby    osd: 58 osds: 58 up, 58 in    rgw: 3 daemons active
  data:    pools:   22 pools, 5488 pgs    objects: 21.59k objects, 82.8GiB    usage:   229GiB used, 88.7TiB / 88.9TiB avail    pgs:     5488 active+clean
  io:    client:   132KiB/s rd, 5.44KiB/s wr, 159op/s rd, 0op/s wr+----+------+-------+-------+--------+---------+--------+---------+-----------+| id | host |  used | avail | wr ops | wr data | rd ops | rd data |   state   |+----+------+-------+-------+--------+---------+--------+---------+-----------+| 0  |  s1  | 1594M |  445G |    0   |     0   |    0   |     0   | exists,up || 1  |  s1  | 1536M |  445G |    0   |     0   |    0   |     0   | exists,up |~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ | 54 |  s3  | 4665M | 1858G |    3   |  29.6k  |    0   |     0   | exists,up || 55 |  s2  | 3769M | 1859G |    0   |     0   |    0   |     0   | exists,up || 57 |  s2  | 5366M | 1857G |    0   |   819   |    0   |     0   | exists,up || 59 |  s2  | 3851M | 1859G |    0   |     0   |    0   |     0   | exists,up |+----+------+-------+-------+--------+---------+--------+---------+-----------+

Story: One of your node has failed to power up with no reason. the osds that host by the failed node went offline. So we have to recover the storage pool ASAP from the health_warn status. you can do it from any host of your cluster.#

Story: One of your node has failed to power up with no reason. the osds that host by the failed node went offline. So we have to recover the storage pool ASAP from the `health_warn` status. you can do it from any host of your cluster.#