Skip to main content
Version: 3.0

Remove OSD from Storage Pool

Scenario: One of the nodes has failed to power up with no reason. The OSDs that host by the failed node went offline, so we have to recover the storage pool ASAP from the health_warn status. We could do it from any host of the cluster.

Accessing desired Node via SSH​

ssh [email protected]

Warning: Permanently added '192.168.1.x' (ECDSA) to the list of known hosts.
Password:

Check Storage Status​

Use command storage status.

As shown below, we have 1 OSD down due to a failed hard disk, OSD No.7 on node hostname (cc2).

The health status is showing HEALTH_WARN.

cc1> storage status
cluster:
id: c6e64c49-09cf-463b-9d1c-b6645b4b3b85
health: HEALTH_WARN
1 osds down
Degraded data redundancy: 1418/251600 objects degraded (0.564%), 72 pgs degraded

services:
mon: 3 daemons, quorum cc1,cc2,cc3 (age 101m)
mgr: cc1(active, since 101m), standbys: cc3, cc2
mds: 1/1 daemons up, 1 standby, 1 hot standby
osd: 18 osds: 17 up (since 60s), 18 in (since 2m)
rgw: 3 daemons active (3 hosts, 1 zones)

data:
pools: 25 pools, 945 pgs
objects: 87.83k objects, 449 GiB
usage: 1.2 TiB used, 5.3 TiB / 6.5 TiB avail
pgs: 1418/251600 objects degraded (0.564%)
728 active+clean
145 active+undersized
72 active+undersized+degraded
io:
client: 61 KiB/s rd, 503 KiB/s wr, 18 op/s rd, 76 op/s wr

ID HOST USED AVAIL WR OPS WR DATA RD OPS RD DATA STATE
0 cc1 46.6G 325G 15 376k 0 0 exists,up
1 cc1 85.7G 286G 4 23.2k 1 20 exists,up
2 cc1 75.4G 296G 16 72.7k 1 0 exists,up
3 cc1 70.8G 301G 5 60.7k 0 0 exists,up
4 cc1 62.5G 309G 7 54.3k 1 0 exists,up
5 cc1 66.9G 305G 0 0 1 90 exists,up
6 cc2 73.2G 298G 6 46.3k 0 0 exists,up
7 cc2 5575M 366G 0 0 0 0 exists
8 cc2 77.5G 294G 0 5734 1 0 exists,up
9 cc2 97.4G 274G 13 136k 0 0 exists,up
10 cc2 80.5G 291G 6 55.1k 0 0 exists,up
11 cc2 68.4G 303G 5 39.1k 1 0 exists,up
12 cc3 84.1G 288G 0 0 1 0 exists,up
13 cc3 52.1G 320G 1 8210 1 48.0k exists,up
14 cc3 62.8G 309G 13 116k 2 44.0k exists,up
15 cc3 51.1G 321G 5 56.7k 2 0 exists,up
16 cc3 69.0G 303G 3 37.7k 2 135 exists,up
17 cc3 87.1G 285G 8 372k 0 0 exists,up

Remove OSDs​

Use command storage remove_osd to remove down OSDs.

cc1> storage remove_osd
Enter osd id to be removed:
1: osd.7
Enter index: 1
Enter 'YES' to confirm: YES
cc1>

Verify Storage Status​

After removing failed OSDs, check the storage health with command storage status

The health status should be showing HEALTH_OK.

cc1> storage status
cluster:
id: c6e64c49-09cf-463b-9d1c-b6645b4b3b85
health: HEALTH_OK

services:
mon: 3 daemons, quorum cc1,cc2,cc3 (age 93m)
mgr: cc1(active, since 92m), standbys: cc3, cc2
mds: 1/1 daemons up, 1 standby, 1 hot standby
osd: 17 osds: 17 up (since 60s), 17 in (since 2m); 10 remapped pgs
rgw: 3 daemons active (3 hosts, 1 zones)

data:
volumes: 1/1 healthy
pools: 25 pools, 945 pgs
objects: 87.82k objects, 449 GiB
usage: 1.2 TiB used, 5.0 TiB / 6.2 TiB avail
pgs: 1250/251643 objects misplaced (0.497%)
935 active+clean
10 active+remapped+backfilling

io:
client: 340 KiB/s rd, 428 KiB/s wr, 49 op/s rd, 74 op/s wr
recovery: 151 MiB/s, 21 objects/s

ID HOST USED AVAIL WR OPS WR DATA RD OPS RD DATA STATE
0 cc1 46.6G 325G 15 376k 0 0 exists,up
1 cc1 85.7G 286G 4 23.2k 1 20 exists,up
2 cc1 75.4G 296G 16 72.7k 1 0 exists,up
3 cc1 70.8G 301G 5 60.7k 0 0 exists,up
4 cc1 62.5G 309G 7 54.3k 1 0 exists,up
5 cc1 66.9G 305G 0 0 1 90 exists,up
6 cc2 73.2G 298G 6 46.3k 0 0 exists,up
8 cc2 77.5G 294G 0 5734 1 0 exists,up
9 cc2 97.4G 274G 13 136k 0 0 exists,up
10 cc2 80.5G 291G 6 55.1k 0 0 exists,up
11 cc2 68.4G 303G 5 39.1k 1 0 exists,up
12 cc3 84.1G 288G 0 0 1 0 exists,up
13 cc3 52.1G 320G 1 8210 1 48.0k exists,up
14 cc3 62.8G 309G 13 116k 2 44.0k exists,up
15 cc3 51.1G 321G 5 56.7k 2 0 exists,up
16 cc3 69.0G 303G 3 37.7k 2 135 exists,up
17 cc3 87.1G 285G 8 372k 0 0 exists,up