Version: 3.0

Remove Hard Disks

Scenario: A hard disk failure has been identified. To maintain storage pool health, replace the failed disk from the cluster and initiate the recovery processes.

Accessing desired Node via SSH

ssh [email protected]

Warning: Permanently added '192.168.1.x' (ECDSA) to the list of known hosts.
Password:

Verify storage status before performing

Use command storage > status.

The storage status command provides the overall status of the storage cluster. If errors occur, the health row displays HEALTH_WARN. In this case, physical disk number 7 has failed, indicated by the absence of the up state, resulting in a 1 osds down error message.

cc1> storage status
    cluster:
      id:     c6e64c49-09cf-463b-9d1c-b6645b4b3b85
      health: HEALTH_WARN
            1 osds down
            Degraded data redundancy: 1418/251600 objects degraded (0.564%), 72 pgs degraded

    services:
      mon: 3 daemons, quorum cc1,cc2,cc3 (age 101m)
      mgr: cc1(active, since 101m), standbys: cc3, cc2
      mds: 1/1 daemons up, 1 standby, 1 hot standby
      osd: 18 osds: 17 up (since 60s), 18 in (since 2m)
      rgw: 3 daemons active (3 hosts, 1 zones)

    data:
      pools:   25 pools, 945 pgs
      objects: 87.83k objects, 449 GiB
      usage:   1.2 TiB used, 5.3 TiB / 6.5 TiB avail
      pgs:     1418/251600 objects degraded (0.564%)
              728 active+clean
              145  active+undersized
              72  active+undersized+degraded
    io:
      client:   61 KiB/s rd, 503 KiB/s wr, 18 op/s rd, 76 op/s wr

  ID  HOST   USED  AVAIL  WR OPS  WR DATA  RD OPS  RD DATA  STATE
  0  cc1   46.6G   325G     15      376k      0        0   exists,up
  1  cc1   85.7G   286G      4     23.2k      1       20   exists,up
  2  cc1   75.4G   296G     16     72.7k      1        0   exists,up
  3  cc1   70.8G   301G      5     60.7k      0        0   exists,up
  4  cc1   62.5G   309G      7     54.3k      1        0   exists,up
  5  cc1   66.9G   305G      0        0       1       90   exists,up
  6  cc2   73.2G   298G      6     46.3k      0        0   exists,up
  7  cc2   5575M   366G      0        0       0        0   exists
  8  cc2   77.5G   294G      0     5734       1        0   exists,up
  9  cc2   97.4G   274G     13      136k      0        0   exists,up
  10  cc2   80.5G   291G      6     55.1k      0        0   exists,up
  11  cc2   68.4G   303G      5     39.1k      1        0   exists,up
  12  cc3   84.1G   288G      0        0       1        0   exists,up
  13  cc3   52.1G   320G      1     8210       1     48.0k  exists,up
  14  cc3   62.8G   309G     13      116k      2     44.0k  exists,up
  15  cc3   51.1G   321G      5     56.7k      2        0   exists,up
  16  cc3   69.0G   303G      3     37.7k      2      135   exists,up
  17  cc3   87.1G   285G      8      372k      0        0   exists,up

Remove a Disk

Use command storage > remove_disk.

There are two modes to remove a disk, safe mode and force mode.

In safe mode, Ceph would try to transfer all PGs on the disk to gracefully remove the disk by migrating existing data.
In force mode, the cluster skips data migration operations, and the specified drive is forcibly removed. This action carries a high risk of corrupting the Ceph cluster.

danger

Forcefully removing a disk may result in data corruption. Force mode should only be used by administrators who understand the risks and require immediate removal of the data disk.

After entering the command, it would show /dev/sdb is associated with OSD 6 and 7.

cc2> storage remove_disk
    index          name      size     osd              serial
  --
        1      /dev/sdb    745.2G     6 7  BTWA632602Q3800HGN
        2      /dev/sdc    745.2G     8 9  BTWA6326030T800HGN
        3      /dev/sdd    745.2G   10 11  BTWA632603GR800HGN
  --
  Enter the index of disk to be removed: 1
  Disk removal mode:
  1: safe
  2: force
  Enter index: 1
  safe mode takes longer by attempting to migrate data on disk(s).
  Enter 'YES' to confirm: YES
  ID  CLASS  WEIGHT   REWEIGHT  SIZE     RAW USE  DATA    OMAP     META    AVAIL    %USE   VAR   PGS  STATUS
  6    ssd  0.36349   1.00000  372 GiB   73 GiB  73 GiB  348 KiB  33 MiB  299 GiB  19.61  1.00  187      up
                        TOTAL  372 GiB   73 GiB  73 GiB  349 KiB  33 MiB  299 GiB  19.61
  MIN/MAX VAR: 1.00/1.00  STDDEV: 0
  OSD(s) 6 have 72 pgs currently mapped to them.
  OSD(s) 6 have 59 pgs currently mapped to them.
  OSD(s) 6 have 36 pgs currently mapped to them.
  OSD(s) 6 have 23 pgs currently mapped to them.
  OSD(s) 6 have 19 pgs currently mapped to them.
  OSD(s) 6 have 7 pgs currently mapped to them.
  OSD(s) 6 have 3 pgs currently mapped to them.
  ID  CLASS  WEIGHT   REWEIGHT  SIZE  RAW USE  DATA  OMAP  META  AVAIL  %USE  VAR   PGS  STATUS
  ID  CLASS  WEIGHT   REWEIGHT  SIZE  RAW USE  DATA  OMAP  META  AVAIL  %USE  VAR   PGS  STATUS
  7    ssd  0.36349         0   0 B      0 B   0 B   0 B   0 B    0 B     0  1.00    0    down
                        TOTAL   0 B      0 B   0 B   0 B   0 B    0 B     0
  MIN/MAX VAR: -/-  STDDEV: 0
  Removed disk /dev/sdb.

Verify storage status after removal

Use command storage > status.

As shown below, OSD No.6,7 on node hostname (cc2) are gone.

  cc2> storage status
    cluster:
      id:     c6e64c49-09cf-463b-9d1c-b6645b4b3b85
      health: HEALTH_OK

    services:
      mon: 3 daemons, quorum cc1,cc2,cc3 (age 4h)
      mgr: cc1(active, since 4h), standbys: cc3, cc2
      mds: 1/1 daemons up, 1 standby, 1 hot standby
      osd: 16 osds: 16 up (since 6m), 16 in (since 6m)
      rgw: 3 daemons active (3 hosts, 1 zones)

    data:
      volumes: 1/1 healthy
      pools:   25 pools, 945 pgs
      objects: 87.83k objects, 448 GiB
      usage:   1.2 TiB used, 4.6 TiB / 5.8 TiB avail
      pgs:     945 active+clean

    io:
      client:   159 KiB/s rd, 765 KiB/s wr, 61 op/s rd, 127 op/s wr

  ID  HOST   USED  AVAIL  WR OPS  WR DATA  RD OPS  RD DATA  STATE
  0  cc1   55.7G   316G      8     43.1k      5     24.7k  exists,up
  1  cc1   86.9G   285G      8     45.5k      1        0   exists,up
  2  cc1   70.0G   302G     11     62.3k     28      107k  exists,up
  3  cc1   63.1G   309G      6     38.3k      0        0   exists,up
  4  cc1   67.0G   305G      4     40.7k      1        0   exists,up
  5  cc1   66.5G   305G      3     20.0k      1       90   exists,up
  8  cc2   92.3G   279G      0     14.3k      1        0   exists,up
  9  cc2    111G   260G      9     42.6k     51      202k  exists,up
  10  cc2    101G   270G      1     9216       2     29.0k  exists,up
  11  cc2   90.1G   282G      3     21.9k      1        0   exists,up
  12  cc3   86.9G   285G      1     4728       2     2702   exists,up
  13  cc3   53.3G   318G      0        0       0        0   exists,up
  14  cc3   62.3G   309G      8     34.3k     11     55.1k  exists,up
  15  cc3   55.1G   317G      0     12.0k      2        0   exists,up
  16  cc3   67.2G   304G      0        0       0        0   exists,up
  17  cc3   83.3G   288G      2     27.2k      7     26.3k  exists,up

Remove All Existing OSDs on a Node

Use command storage > remove_exist.

cc1:storage> remove_exist
    index          name      size     osd              serial
  --
        1      /dev/sda    745.2G     1 0  BTWA632602ZU800HGN
        2      /dev/sdb    745.2G     3 2  BTWA632601X9800HGN
        3      /dev/sdc    745.2G     5 4  BTWA632601RW800HGN
        4      /dev/sdd    745.2G     7 6  BTWA6326038Z800HGN
        5      /dev/sde    745.2G     9 8  BTWA632605GP800HGN
        6      /dev/sdg    745.2G   11 10  BTWA632601U9800HGN
        7      /dev/sdh    745.2G   13 12  BTWA632604RJ800HGN
        8      /dev/sdi    745.2G   15 14  BTWA632602Q3800HGN
        9      /dev/sdj    745.2G   16 17  BTWA63260373800HGN
      10      /dev/sdk    745.2G   19 18  BTWA632605EV800HGN
      11      /dev/sdl    745.2G   21 20  BTWA63250476800HGN
      12      /dev/sdm    745.2G   22 23  BTWA6326047A800HGN
      13      /dev/sdn    745.2G   24 25  BTWA632602Q0800HGN
      14      /dev/sdo    744.6G   27 26  BTWA632605EV800HGN
  --
  Disk removal mode:
  1: safe
  2: force
  Enter index: 1
  safe mode takes longer by attempting to migrate data on disk(s).
  Enter 'YES' to confirm: YES
  --
  ID  CLASS  WEIGHT   REWEIGHT  SIZE     RAW USE  DATA     OMAP     META    AVAIL    %USE  VAR   PGS  STATUS
  0    ssd  0.36349   1.00000  372 GiB  4.9 GiB  4.9 GiB    2 KiB  21 MiB  367 GiB  1.32  1.00   61      up
                        TOTAL  372 GiB  4.9 GiB  4.9 GiB  2.7 KiB  21 MiB  367 GiB  1.32
  MIN/MAX VAR: 1.00/1.00  STDDEV: 0
  OSD(s) 0 have 40 pgs currently mapped to them.
  OSD(s) 0 have 36 pgs currently mapped to them.
  OSD(s) 0 have 26 pgs currently mapped to them.
  OSD(s) 0 have 14 pgs currently mapped to them.
  marked down osd.0.
  ID  CLASS  WEIGHT   REWEIGHT  SIZE     RAW USE  DATA     OMAP     META     AVAIL    %USE  VAR   PGS  STATUS
  1    ssd  0.36349   1.00000  372 GiB   11 GiB  9.9 GiB    2 KiB  1.3 GiB  361 GiB  3.01  1.00   64      up
                        TOTAL  372 GiB   11 GiB  9.9 GiB  2.7 KiB  1.3 GiB  361 GiB  3.01
  MIN/MAX VAR: 1.00/1.00  STDDEV: 0
  OSD(s) 1 have 33 pgs currently mapped to them.
  OSD(s) 1 have 3 pgs currently mapped to them.
  OSD(s) 1 have 2 pgs currently mapped to them.
  marked down osd.1.
  Removed disk /dev/sda.
  ID  CLASS  WEIGHT   REWEIGHT  SIZE     RAW USE  DATA    OMAP     META    AVAIL    %USE  VAR   PGS  STATUS
  2    ssd  0.36349   1.00000  372 GiB   15 GiB  15 GiB    4 KiB  94 MiB  358 GiB  3.95  1.00   76      up
                        TOTAL  372 GiB   15 GiB  15 GiB  4.8 KiB  94 MiB  358 GiB  3.95
  MIN/MAX VAR: 1.00/1.00  STDDEV: 0
  OSD(s) 2 have 53 pgs currently mapped to them.
  OSD(s) 2 have 46 pgs currently mapped to them.
  OSD(s) 2 have 36 pgs currently mapped to them.
  OSD(s) 2 have 24 pgs currently mapped to them.
  OSD(s) 2 have 8 pgs currently mapped to them.
  OSD(s) 2 have 4 pgs currently mapped to them.
  Restored /dev/sdb: osd.2 as pgs could not be moved likely due to too few disks or little space in the failure domain.
  Failed to remove disk /dev/sdb with safe mode for storage cannot become healthy
  without it.
  --
  Processed 1 disk out of 14.

Accessing desired Node via SSH​

Verify storage status before performing​

Remove a Disk​

Verify storage status after removal​

Remove All Existing OSDs on a Node​

Accessing desired Node via SSH

Verify storage status before performing

Remove a Disk

Verify storage status after removal

Remove All Existing OSDs on a Node