Remove Hard Disks
Scenario: A hard disk failure has been identified. To maintain storage pool health, replace the failed disk from the cluster and initiate the recovery processes.
Accessing desired Node via SSH​
ssh [email protected]
Warning: Permanently added '192.168.1.x' (ECDSA) to the list of known hosts.
Password:
Verify storage status before performing​
Use command storage > status
.
The storage status command provides the overall status of the storage cluster. If errors occur, the health row displays HEALTH_WARN
. In this case, physical disk number 7 has failed, indicated by the absence of the up
state, resulting in a 1 osds down
error message.
cc1> storage status
cluster:
id: c6e64c49-09cf-463b-9d1c-b6645b4b3b85
health: HEALTH_WARN
1 osds down
Degraded data redundancy: 1418/251600 objects degraded (0.564%), 72 pgs degraded
services:
mon: 3 daemons, quorum cc1,cc2,cc3 (age 101m)
mgr: cc1(active, since 101m), standbys: cc3, cc2
mds: 1/1 daemons up, 1 standby, 1 hot standby
osd: 18 osds: 17 up (since 60s), 18 in (since 2m)
rgw: 3 daemons active (3 hosts, 1 zones)
data:
pools: 25 pools, 945 pgs
objects: 87.83k objects, 449 GiB
usage: 1.2 TiB used, 5.3 TiB / 6.5 TiB avail
pgs: 1418/251600 objects degraded (0.564%)
728 active+clean
145 active+undersized
72 active+undersized+degraded
io:
client: 61 KiB/s rd, 503 KiB/s wr, 18 op/s rd, 76 op/s wr
ID HOST USED AVAIL WR OPS WR DATA RD OPS RD DATA STATE
0 cc1 46.6G 325G 15 376k 0 0 exists,up
1 cc1 85.7G 286G 4 23.2k 1 20 exists,up
2 cc1 75.4G 296G 16 72.7k 1 0 exists,up
3 cc1 70.8G 301G 5 60.7k 0 0 exists,up
4 cc1 62.5G 309G 7 54.3k 1 0 exists,up
5 cc1 66.9G 305G 0 0 1 90 exists,up
6 cc2 73.2G 298G 6 46.3k 0 0 exists,up
7 cc2 5575M 366G 0 0 0 0 exists
8 cc2 77.5G 294G 0 5734 1 0 exists,up
9 cc2 97.4G 274G 13 136k 0 0 exists,up
10 cc2 80.5G 291G 6 55.1k 0 0 exists,up
11 cc2 68.4G 303G 5 39.1k 1 0 exists,up
12 cc3 84.1G 288G 0 0 1 0 exists,up
13 cc3 52.1G 320G 1 8210 1 48.0k exists,up
14 cc3 62.8G 309G 13 116k 2 44.0k exists,up
15 cc3 51.1G 321G 5 56.7k 2 0 exists,up
16 cc3 69.0G 303G 3 37.7k 2 135 exists,up
17 cc3 87.1G 285G 8 372k 0 0 exists,up
Remove a Disk​
Use command storage > remove_disk
.
There are two modes to remove a disk, safe mode and force mode.
- In safe mode, Ceph would try to transfer all PGs on the disk to gracefully remove the disk by migrating existing data.
- In force mode, the cluster skips data migration operations, and the specified drive is forcibly removed. This action carries a high risk of corrupting the Ceph cluster.
Forcefully removing a disk may result in data corruption. Force mode should only be used by administrators who understand the risks and require immediate removal of the data disk.
After entering the command, it would show /dev/sdb
is associated with OSD 6 and 7.
cc2> storage remove_disk
index name size osd serial
--
1 /dev/sdb 745.2G 6 7 BTWA632602Q3800HGN
2 /dev/sdc 745.2G 8 9 BTWA6326030T800HGN
3 /dev/sdd 745.2G 10 11 BTWA632603GR800HGN
--
Enter the index of disk to be removed: 1
Disk removal mode:
1: safe
2: force
Enter index: 1
safe mode takes longer by attempting to migrate data on disk(s).
Enter 'YES' to confirm: YES
ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS
6 ssd 0.36349 1.00000 372 GiB 73 GiB 73 GiB 348 KiB 33 MiB 299 GiB 19.61 1.00 187 up
TOTAL 372 GiB 73 GiB 73 GiB 349 KiB 33 MiB 299 GiB 19.61
MIN/MAX VAR: 1.00/1.00 STDDEV: 0
OSD(s) 6 have 72 pgs currently mapped to them.
OSD(s) 6 have 59 pgs currently mapped to them.
OSD(s) 6 have 36 pgs currently mapped to them.
OSD(s) 6 have 23 pgs currently mapped to them.
OSD(s) 6 have 19 pgs currently mapped to them.
OSD(s) 6 have 7 pgs currently mapped to them.
OSD(s) 6 have 3 pgs currently mapped to them.
ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS
ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS
7 ssd 0.36349 0 0 B 0 B 0 B 0 B 0 B 0 B 0 1.00 0 down
TOTAL 0 B 0 B 0 B 0 B 0 B 0 B 0
MIN/MAX VAR: -/- STDDEV: 0
Removed disk /dev/sdb.
Verify storage status after removal​
Use command storage > status
.
As shown below, OSD No.6,7 on node hostname (cc2) are gone.
cc2> storage status
cluster:
id: c6e64c49-09cf-463b-9d1c-b6645b4b3b85
health: HEALTH_OK
services:
mon: 3 daemons, quorum cc1,cc2,cc3 (age 4h)
mgr: cc1(active, since 4h), standbys: cc3, cc2
mds: 1/1 daemons up, 1 standby, 1 hot standby
osd: 16 osds: 16 up (since 6m), 16 in (since 6m)
rgw: 3 daemons active (3 hosts, 1 zones)
data:
volumes: 1/1 healthy
pools: 25 pools, 945 pgs
objects: 87.83k objects, 448 GiB
usage: 1.2 TiB used, 4.6 TiB / 5.8 TiB avail
pgs: 945 active+clean
io:
client: 159 KiB/s rd, 765 KiB/s wr, 61 op/s rd, 127 op/s wr
ID HOST USED AVAIL WR OPS WR DATA RD OPS RD DATA STATE
0 cc1 55.7G 316G 8 43.1k 5 24.7k exists,up
1 cc1 86.9G 285G 8 45.5k 1 0 exists,up
2 cc1 70.0G 302G 11 62.3k 28 107k exists,up
3 cc1 63.1G 309G 6 38.3k 0 0 exists,up
4 cc1 67.0G 305G 4 40.7k 1 0 exists,up
5 cc1 66.5G 305G 3 20.0k 1 90 exists,up
8 cc2 92.3G 279G 0 14.3k 1 0 exists,up
9 cc2 111G 260G 9 42.6k 51 202k exists,up
10 cc2 101G 270G 1 9216 2 29.0k exists,up
11 cc2 90.1G 282G 3 21.9k 1 0 exists,up
12 cc3 86.9G 285G 1 4728 2 2702 exists,up
13 cc3 53.3G 318G 0 0 0 0 exists,up
14 cc3 62.3G 309G 8 34.3k 11 55.1k exists,up
15 cc3 55.1G 317G 0 12.0k 2 0 exists,up
16 cc3 67.2G 304G 0 0 0 0 exists,up
17 cc3 83.3G 288G 2 27.2k 7 26.3k exists,up
Remove All Existing OSDs on a Node​
Use command storage > remove_exist
.
cc1:storage> remove_exist
index name size osd serial
--
1 /dev/sda 745.2G 1 0 BTWA632602ZU800HGN
2 /dev/sdb 745.2G 3 2 BTWA632601X9800HGN
3 /dev/sdc 745.2G 5 4 BTWA632601RW800HGN
4 /dev/sdd 745.2G 7 6 BTWA6326038Z800HGN
5 /dev/sde 745.2G 9 8 BTWA632605GP800HGN
6 /dev/sdg 745.2G 11 10 BTWA632601U9800HGN
7 /dev/sdh 745.2G 13 12 BTWA632604RJ800HGN
8 /dev/sdi 745.2G 15 14 BTWA632602Q3800HGN
9 /dev/sdj 745.2G 16 17 BTWA63260373800HGN
10 /dev/sdk 745.2G 19 18 BTWA632605EV800HGN
11 /dev/sdl 745.2G 21 20 BTWA63250476800HGN
12 /dev/sdm 745.2G 22 23 BTWA6326047A800HGN
13 /dev/sdn 745.2G 24 25 BTWA632602Q0800HGN
14 /dev/sdo 744.6G 27 26 BTWA632605EV800HGN
--
Disk removal mode:
1: safe
2: force
Enter index: 1
safe mode takes longer by attempting to migrate data on disk(s).
Enter 'YES' to confirm: YES
--
ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS
0 ssd 0.36349 1.00000 372 GiB 4.9 GiB 4.9 GiB 2 KiB 21 MiB 367 GiB 1.32 1.00 61 up
TOTAL 372 GiB 4.9 GiB 4.9 GiB 2.7 KiB 21 MiB 367 GiB 1.32
MIN/MAX VAR: 1.00/1.00 STDDEV: 0
OSD(s) 0 have 40 pgs currently mapped to them.
OSD(s) 0 have 36 pgs currently mapped to them.
OSD(s) 0 have 26 pgs currently mapped to them.
OSD(s) 0 have 14 pgs currently mapped to them.
marked down osd.0.
ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS
1 ssd 0.36349 1.00000 372 GiB 11 GiB 9.9 GiB 2 KiB 1.3 GiB 361 GiB 3.01 1.00 64 up
TOTAL 372 GiB 11 GiB 9.9 GiB 2.7 KiB 1.3 GiB 361 GiB 3.01
MIN/MAX VAR: 1.00/1.00 STDDEV: 0
OSD(s) 1 have 33 pgs currently mapped to them.
OSD(s) 1 have 3 pgs currently mapped to them.
OSD(s) 1 have 2 pgs currently mapped to them.
marked down osd.1.
Removed disk /dev/sda.
ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS
2 ssd 0.36349 1.00000 372 GiB 15 GiB 15 GiB 4 KiB 94 MiB 358 GiB 3.95 1.00 76 up
TOTAL 372 GiB 15 GiB 15 GiB 4.8 KiB 94 MiB 358 GiB 3.95
MIN/MAX VAR: 1.00/1.00 STDDEV: 0
OSD(s) 2 have 53 pgs currently mapped to them.
OSD(s) 2 have 46 pgs currently mapped to them.
OSD(s) 2 have 36 pgs currently mapped to them.
OSD(s) 2 have 24 pgs currently mapped to them.
OSD(s) 2 have 8 pgs currently mapped to them.
OSD(s) 2 have 4 pgs currently mapped to them.
Restored /dev/sdb: osd.2 as pgs could not be moved likely due to too few disks or little space in the failure domain.
Failed to remove disk /dev/sdb with safe mode for storage cannot become healthy
without it.
--
Processed 1 disk out of 14.