Skip to main content
Version: 2.5

Remove disk from a node

Storage status

Story: If you discover a failed hard disk, it is essential to remove it from your cluster and restore the health of your storage pool. To proceed:

  • Check the storage status before starting:
  • As shown below, two OSDs (Object Storage Daemons) are down due to the failed hard disk—OSD numbers 4 and 5 on the node with the hostname cc1.
    cc1:storage> status
cluster:
id: c6e64c49-09cf-463b-9d1c-b6645b4b3b85
health: HEALTH_WARN
2 osds down
Degraded data redundancy: 1611/44204 objects degraded (3.644%), 106 pgs degraded

services:
mon: 3 daemons, quorum cc1,cc2,cc3 (age 8d)
mgr: cc1(active, since 8d), standbys: cc2, cc3
mds: 1/1 daemons up, 1 standby, 1 hot standby
osd: 18 osds: 16 up (since 2d), 18 in (since 2d)
rgw: 3 daemons active (3 hosts, 1 zones)

data:
volumes: 1/1 healthy
pools: 25 pools, 753 pgs
objects: 149.93k objects, 785 GiB
usage: 2.2 TiB used, 5.6 TiB / 7.9 TiB avail
pgs: 753 active+clean

io:
client: 5.3 MiB/s rd, 308 KiB/s wr, 171 op/s rd, 56 op/s wr

ID HOST USED AVAIL WR OPS WR DATA RD OPS RD DATA STATE
0 cc1 65.7G 380G 6 40.0k 7 161k exists,up
1 cc1 181G 265G 8 32.7k 1 58.4k exists,up
2 cc1 162G 283G 0 4096 15 604k exists,up
3 cc1 133G 313G 0 1638 2 29.6k exists,up
4 cc1 91.8G 354G 14 97.5k 6 39.2k exists
5 cc1 130G 315G 8 39.1k 3 88.9k exists
6 cc2 96.0G 350G 9 50.3k 3 160k exists,up
7 cc2 165G 281G 0 0 1 89.6k exists,up
8 cc2 75.8G 370G 0 6553 1 25.6k exists,up
9 cc2 199G 247G 0 3276 3 172k exists,up
10 cc2 122G 324G 2 13.5k 9 510k exists,up
11 cc2 95.3G 351G 1 4096 6 126k exists,up
12 cc3 184G 262G 3 12.0k 1 25.6k exists,up
13 cc3 93.6G 353G 0 0 0 5734 exists,up
14 cc3 67.8G 378G 12 71.1k 13 364k exists,up
15 cc3 92.6G 354G 0 819 0 0 exists,up
16 cc3 142G 303G 0 819 2 24.0k exists,up
17 cc3 179G 267G 0 2457 5 99.2k exists,up

Remove disk

  • connect to the host cc1
  • use CLI remove_disk it show /dev/sde is associated with id 4,5 on index 3.
  • then we remove /dev/sde from the ceph pool.
  • Remove the Hard disk from the nodes
  cc1:storage> remove_disk
index name size osd serial
--
1 /dev/sda 894.3G 0 1 S40FNA0M800607
2 /dev/sdc 894.3G 2 3 S40FNA0M800598
3 /dev/sde 894.3G 4 5 S40FNA0M800608
--
Enter the index of disk to be removed: 3
Disk removal mode (safe/force): force
force mode immediately destroys disk data without taking into accounts of
storage status so USE IT AT YOUR OWN RISK.
Enter 'YES' to confirm: YES
  • let's check the status of our storage pool, ceph is recovering the data automatically
  cc1:storage> status
cluster:
id: c6e64c49-09cf-463b-9d1c-b6645b4b3b85
health: HEALTH_WARN
Degraded data redundancy: 6075/438706 objects degraded (1.385%), 8 pgs degraded, 8 pgs undersized

services:
mon: 3 daemons, quorum cc1,cc2,cc3 (age 8d)
mgr: cc1(active, since 8d), standbys: cc2, cc3
mds: 1/1 daemons up, 1 standby, 1 hot standby
osd: 16 osds: 16 up (since 10m), 16 in (since 10m); 15 remapped pgs
rgw: 3 daemons active (3 hosts, 1 zones)

data:
volumes: 1/1 healthy
pools: 25 pools, 753 pgs
objects: 149.94k objects, 785 GiB
usage: 2.2 TiB used, 4.8 TiB / 7.0 TiB avail
pgs: 6075/438706 objects degraded (1.385%)
5463/438706 objects misplaced (1.245%)
738 active+clean
8 active+undersized+degraded+remapped+backfilling
7 active+remapped+backfilling

io:
client: 4.4 MiB/s rd, 705 KiB/s wr, 87 op/s rd, 83 op/s wr
recovery: 127 MiB/s, 27 objects/s

ID HOST USED AVAIL WR OPS WR DATA RD OPS RD DATA STATE
0 cc1 141G 305G 1 28.0k 1 42.4k exists,up
1 cc1 177G 268G 11 88.0k 3 26.3k exists,up
2 cc1 212G 233G 2 12.7k 0 0 exists,up
3 cc1 193G 253G 3 31.1k 7 634k exists,up
6 cc2 86.0G 360G 9 40.0k 2 27.1k exists,up
7 cc2 179G 267G 7 184k 2 119k exists,up
8 cc2 90.8G 355G 0 18.3k 19 1553k exists,up
9 cc2 201G 245G 8 35.1k 16 1450k exists,up
10 cc2 108G 337G 6 51.1k 11 755k exists,up
11 cc2 98.5G 348G 0 6553 2 41.6k exists,up
12 cc3 201G 245G 16 100k 3 230k exists,up
13 cc3 122G 323G 0 0 0 0 exists,up
14 cc3 88.0G 358G 15 76.0k 47 2970k exists,up
15 cc3 100G 346G 7 183k 14 1286k exists,up
16 cc3 127G 319G 5 28.0k 15 659k exists,up
17 cc3 132G 314G 23 225k 9 491k exists,up

Results

  • wait for a while and check the status again
  • We had successfully remove the failed hard disk and the health status are OK
  cc1:storage> status
cluster:
id: c6e64c49-09cf-463b-9d1c-b6645b4b3b85
health: HEALTH_OK

services:
mon: 3 daemons, quorum cc1,cc2,cc3 (age 8d)
mgr: cc1(active, since 8d), standbys: cc2, cc3
mds: 1/1 daemons up, 1 standby, 1 hot standby
osd: 16 osds: 16 up (since 21m), 16 in (since 21m)
rgw: 3 daemons active (3 hosts, 1 zones)

data:
volumes: 1/1 healthy
pools: 25 pools, 753 pgs
objects: 149.99k objects, 786 GiB
usage: 2.2 TiB used, 4.8 TiB / 7.0 TiB avail
pgs: 753 active+clean

io:
client: 25 KiB/s rd, 304 KiB/s wr, 19 op/s rd, 43 op/s wr

ID HOST USED AVAIL WR OPS WR DATA RD OPS RD DATA STATE
0 cc1 148G 298G 0 0 0 0 exists,up
1 cc1 176G 269G 5 23.1k 1 0 exists,up
2 cc1 202G 243G 0 28.0k 1 0 exists,up
3 cc1 220G 225G 0 3276 0 0 exists,up
6 cc2 86.1G 360G 4 20.7k 0 0 exists,up
7 cc2 180G 266G 0 0 0 0 exists,up
8 cc2 89.2G 357G 7 49.5k 2 10.3k exists,up
9 cc2 201G 245G 0 819 0 0 exists,up
10 cc2 108G 337G 1 7372 0 5734 exists,up
11 cc2 99.1G 347G 0 12.7k 0 0 exists,up
12 cc3 199G 247G 1 5734 1 0 exists,up
13 cc3 112G 333G 4 22.3k 0 0 exists,up
14 cc3 86.3G 360G 1 18.3k 2 90 exists,up
15 cc3 98.7G 347G 0 16.0k 1 0 exists,up
16 cc3 128G 318G 1 4915 2 9027 exists,up
17 cc3 141G 305G 2 22.3k 0 0 exists,up