Version: 2.2

Storage node Replacement Guide

Remove Storage node from cluster

Make sure your storage data replication either 2 or 3.

Connect to controller

$ ssh [email protected]
Warning: Permanently added '192.168.1.x' (ECDSA) to the list of known hosts.
Password:

Check the storage status

control01> storage
control01:storage> status
  cluster:
    id:     c6e64c49-09cf-463b-9d1c-b6645b4b3b85
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum control01,control02,control03
    mgr: control01(active), standbys: control02, control03
    mds: cephfs-1/1/1 up  {0=control01=up:active}, 2 up:standby
    osd: 24 osds: 24 up, 24 in
    rgw: 3 daemons active

  data:
    pools:   23 pools, 1837 pgs
    objects: 10.50k objects, 12.7GiB
    usage:   31.3GiB used, 3.74TiB / 3.77TiB avail
    pgs:     1837 active+clean

  io:
    client:   15.5KiB/s rd, 0B/s wr, 15op/s rd, 10op/s wr

+----+-----------+-------+-------+--------+---------+--------+---------+-----------+
| id |    host   |  used | avail | wr ops | wr data | rd ops | rd data |   state   |
+----+-----------+-------+-------+--------+---------+--------+---------+-----------+
| 0  | control01 | 2063M |  117G |    0   |     0   |    2   |    61   | exists,up |
| 1  | control01 | 2020M |  117G |    0   |     0   |    1   |    36   | exists,up |
| 2  | control01 | 1089M |  135G |    0   |     0   |    0   |     0   | exists,up |
| 3  | control01 | 1081M |  135G |    0   |     0   |    0   |     0   | exists,up |
| 4  | control02 | 1656M |  117G |    0   |     0   |    0   |     0   | exists,up |
| 5  | control02 | 2073M |  116G |    0   |     0   |    0   |     0   | exists,up |
| 6  | control02 | 1089M |  135G |    0   |     0   |    0   |     0   | exists,up |
| 7  | control02 | 1089M |  135G |    0   |     0   |    4   |     0   | exists,up |
| 8  | control03 | 1781M |  117G |    0   |     0   |    0   |     0   | exists,up |
| 9  | control03 | 1961M |  117G |    0   |     0   |    7   |   157   | exists,up |
| 10 | control03 | 1089M |  135G |    0   |     0   |    0   |     0   | exists,up |
| 11 | control03 | 1089M |  135G |    0   |     0   |    0   |     0   | exists,up |
| 12 | compute01 | 1462M | 56.5G |    0   |     0   |    0   |     0   | exists,up |
| 13 | compute01 | 1400M | 56.6G |    0   |     0   |    0   |     0   | exists,up |
| 14 | compute01 | 1334M | 56.7G |    0   |     0   |    0   |     6   | exists,up |
| 15 | compute01 | 1426M | 56.6G |    0   |     0   |    0   |     0   | exists,up |
| 16 | compute01 | 1101M |  464G |    0   |     0   |    0   |    19   | exists,up |
| 17 | compute01 | 1089M |  464G |    0   |     0   |    0   |     0   | exists,up |
| 18 | storage01 | 1040M | 57.0G |    0   |     0   |    0   |     0   | exists,up |
| 19 | storage01 | 1040M | 57.0G |    0   |     0   |    0   |     0   | exists,up |
| 20 | storage01 | 1040M | 57.0G |    0   |     0   |    0   |     0   | exists,up |
| 21 | storage01 | 1048M | 57.0G |    0   |     0   |    0   |     0   | exists,up |
| 22 | storage01 | 1081M |  464G |    0   |     0   |    0   |     0   | exists,up |
| 23 | storage01 | 1105M |  464G |    0   |     0   |    0   |     0   | exists,up |
+----+-----------+-------+-------+--------+---------+--------+---------+-----------+

Remove node

Removing storage01 from the cluster

control01:cluster> remove_node
1: compute01
2: storage01
3: control01
4: control03
5: control02
Enter index: 2
this command is only applicable for compute or storage nodes
make sure its running instances have been properly terminated or migrated
shutdown the target host before proceeding
Enter 'YES' to confirm: YES
control01:cluster>

Check the storage status

storage01 node has been removed from the cluster

control01> storage
control01:storage> status
  cluster:
    id:     c6e64c49-09cf-463b-9d1c-b6645b4b3b85
    health: HEALTH_WARN
            Reduced data availability: 2 pgs inactive
            Degraded data redundancy: 139/21222 objects degraded (0.655%), 10 pgs degraded

  services:
    mon: 3 daemons, quorum control01,control02,control03
    mgr: control01(active), standbys: control02, control03
    mds: cephfs-1/1/1 up  {0=control01=up:active}, 2 up:standby
    osd: 18 osds: 18 up, 18 in; 510 remapped pgs
    rgw: 3 daemons active

  data:
    pools:   23 pools, 1837 pgs
    objects: 10.50k objects, 12.7GiB
    usage:   25.4GiB used, 2.61TiB / 2.63TiB avail
    pgs:     10.670% pgs unknown
             0.435% pgs not active
             139/21222 objects degraded (0.655%)
             1406 active+clean
             214  active+clean+remapped
             196  unknown
             9    active+recovery_wait+degraded
             5    activating+remapped
             3    activating
             3    active+undersized
             1    active+undersized+degraded+remapped+backfilling

  io:
    client:   13.1KiB/s rd, 0B/s wr, 13op/s rd, 9op/s wr
    recovery: 1.11MiB/s, 4objects/s

+----+-----------+-------+-------+--------+---------+--------+---------+-----------+
| id |    host   |  used | avail | wr ops | wr data | rd ops | rd data |   state   |
+----+-----------+-------+-------+--------+---------+--------+---------+-----------+
| 0  | control01 | 2060M |  117G |    0   |     0   |    0   |     0   | exists,up |
| 1  | control01 | 2025M |  117G |    0   |     0   |    0   |     0   | exists,up |
| 2  | control01 | 1093M |  135G |    0   |     0   |    0   |     0   | exists,up |
| 3  | control01 | 1086M |  135G |   10   |     0   |    5   |    69   | exists,up |
| 4  | control02 | 1668M |  117G |    0   |     0   |    0   |     0   | exists,up |
| 5  | control02 | 2086M |  116G |    0   |     0   |    0   |     0   | exists,up |
| 6  | control02 | 1093M |  135G |    0   |     0   |    0   |     0   | exists,up |
| 7  | control02 | 1094M |  135G |    0   |     0   |    4   |     0   | exists,up |
| 8  | control03 | 1785M |  117G |    0   |     0   |    0   |     0   | exists,up |
| 9  | control03 | 1957M |  117G |    0   |     0   |    0   |     0   | exists,up |
| 10 | control03 | 1093M |  135G |    0   |     0   |    0   |     0   | exists,up |
| 11 | control03 | 1094M |  135G |    0   |     0   |    0   |     0   | exists,up |
| 12 | compute01 | 1463M | 56.5G |    0   |     0   |    0   |     0   | exists,up |
| 13 | compute01 | 1402M | 56.6G |    0   |     0   |    0   |     0   | exists,up |
| 14 | compute01 | 1336M | 56.7G |    0   |     0   |    0   |     0   | exists,up |
| 15 | compute01 | 1427M | 56.6G |    0   |     0   |    0   |     0   | exists,up |
| 16 | compute01 | 1106M |  464G |    0   |     0   |    0   |     0   | exists,up |
| 17 | compute01 | 1094M |  464G |    0   |     0   |    0   |     0   | exists,up |
+----+-----------+-------+-------+--------+---------+--------+---------+-----------+

Check repair make sure services are ok

control03> cluster
control03:cluster> check_repair
          Service  Status  Report
      ClusterLink      ok  [ link(v) clock(v) dns(v) ]
       ClusterSys      ok  [ bootstrap(v) license(v) ]
  ClusterSettings      ok  [ etcd(v) ]
        HaCluster  FIXING  [ hacluster(3) ]
                       ok  [ hacluster(f) ]
         MsgQueue      ok  [ rabbitmq(v) ]
           IaasDb      ok  [ mysql(v) ]
        VirtualIp      ok  [ vip(v) haproxy_ha(v) ]
          Storage      ok  [ ceph(v) ceph_mon(v) ceph_mgr(v) ceph_mds(v) ceph_osd(v) ceph_rgw(v) rbd_target(v) ]
       ApiService      ok  [ haproxy(v) httpd(v) lmi(v) memcache(v) ]
     SingleSignOn      ok  [ keycloak(v) ]
          Compute      ok  [ nova(v) ]
        Baremetal      ok  [ ironic(v) ]
          Network      ok  [ neutron(v) ]
            Image      ok  [ glance(v) ]
        BlockStor      ok  [ cinder(v) ]
         FileStor      ok  [ manila(v) ]
       ObjectStor      ok  [ swift(v) ]
    Orchestration      ok  [ heat(v) ]
            LBaaS      ok  [ octavia(v) ]
           DNSaaS      ok  [ designate(v) ]
           K8SaaS      ok  [ k3s(v) rancher(v) ]
       InstanceHa      ok  [ masakari(v) ]
 DisasterRecovery      ok  [ freezer(v) ]
    BusinessLogic      ok  [ mistral(v) murano(v) cloudkitty(v) senlin(v) watcher(v) ]
       ApiManager      ok  [ tyk(v) redis(v) mongodb(v) ]
         DataPipe      ok  [ zookeeper(v) kafka(v) ]
          Metrics      ok  [ monasca(v) telegraf(v) grafana(v) ]
     LogAnalytics      ok  [ filebeat(v) auditbeat(v) logstash(v) es(v) kibana(v) ]
    Notifications      ok  [ influxdb(v) kapacitor(v) ]
control03:cluster>

Backup storage01 policies

From your local pc terminal

$ scp -r root@storage01_IPADDRESS:/etc/policies Downloads/storage01_policy

Shutdown storage01

$ ssh [email protected]
Warning: Permanently added '192.168.1.121' (ECDSA) to the list of known hosts.
Password:
Welcome to the Cube Appliance
Enter "help" for a list of available commands
storage01> shutdown
Enter 'YES' to confirm: YES
Connection to 192.168.1.121 closed by remote host.
Connection to 192.168.1.121 closed.

Adding Storage Host

Prepare a new node with CubeOS installed

Configuration

Reconfigure a new storage01 node by following any options of the list below:

Connect to controller

$ ssh [email protected]
Warning: Permanently added '192.168.1.x' (ECDSA) to the list of known hosts.
Password:

Check & Repair services

control01> cluster
control01:cluster> check_repair
          Service  Status  Report
      ClusterLink      ok  [ link(v) clock(v) dns(v) ]
       ClusterSys      ok  [ bootstrap(v) license(v) ]
  ClusterSettings      ok  [ etcd(v) ]
        HaCluster  FIXING  [ hacluster(3) ]
                       ok  [ hacluster(f) ]
         MsgQueue      ok  [ rabbitmq(v) ]
           IaasDb      ok  [ mysql(v) ]
        VirtualIp      ok  [ vip(v) haproxy_ha(v) ]
          Storage      ok  [ ceph(v) ceph_mon(v) ceph_mgr(v) ceph_mds(v) ceph_osd(v) ceph_rgw(v) rbd_target(v) ]
       ApiService      ok  [ haproxy(v) httpd(v) lmi(v) memcache(v) ]
     SingleSignOn      ok  [ keycloak(v) ]
          Compute      ok  [ nova(v) ]
        Baremetal      ok  [ ironic(v) ]
          Network      ok  [ neutron(v) ]
            Image      ok  [ glance(v) ]
        BlockStor      ok  [ cinder(v) ]
         FileStor      ok  [ manila(v) ]
       ObjectStor      ok  [ swift(v) ]
    Orchestration      ok  [ heat(v) ]
            LBaaS      ok  [ octavia(v) ]
           DNSaaS      ok  [ designate(v) ]
           K8SaaS      ok  [ k3s(v) rancher(v) ]
       InstanceHa      ok  [ masakari(v) ]
 DisasterRecovery      ok  [ freezer(v) ]
    BusinessLogic      ok  [ mistral(v) murano(v) cloudkitty(v) senlin(v) watcher(v) ]
       ApiManager      ok  [ tyk(v) redis(v) mongodb(v) ]
         DataPipe      ok  [ zookeeper(v) kafka(v) ]
          Metrics      ok  [ monasca(v) telegraf(v) grafana(v) ]
     LogAnalytics      ok  [ filebeat(v) auditbeat(v) logstash(v) es(v) kibana(v) ]
    Notifications      ok  [ influxdb(v) kapacitor(v) ]
control01:cluster>

Connect to storage01

$ ssh [email protected]
Warning: Permanently added '192.168.1.121' (ECDSA) to the list of known hosts.
Password:

Check the storage status

storage01> storage
storage01:storage> status
  cluster:
    id:     c6e64c49-09cf-463b-9d1c-b6645b4b3b85
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum control01,control02,control03
    mgr: control01(active), standbys: control02, control03
    mds: cephfs-1/1/1 up  {0=control01=up:active}, 2 up:standby
    osd: 24 osds: 24 up, 24 in
    rgw: 3 daemons active

  data:
    pools:   23 pools, 1837 pgs
    objects: 10.50k objects, 12.7GiB
    usage:   31.6GiB used, 3.74TiB / 3.77TiB avail
    pgs:     1837 active+clean

  io:
    client:   42.3KiB/s rd, 49op/s rd, 0op/s wr

+----+-----------+-------+-------+--------+---------+--------+---------+-----------+
| id |    host   |  used | avail | wr ops | wr data | rd ops | rd data |   state   |
+----+-----------+-------+-------+--------+---------+--------+---------+-----------+
| 0  | control01 | 2063M |  117G |    0   |     0   |    0   |     0   | exists,up |
| 1  | control01 | 2036M |  117G |    0   |     0   |    0   |     0   | exists,up |
| 2  | control01 | 1088M |  135G |    0   |     0   |    0   |     0   | exists,up |
| 3  | control01 | 1088M |  135G |    0   |     0   |    0   |     0   | exists,up |
| 4  | control02 | 1663M |  117G |    0   |     0   |    0   |     0   | exists,up |
| 5  | control02 | 2080M |  116G |    0   |     0   |    0   |     0   | exists,up |
| 6  | control02 | 1096M |  135G |    0   |     0   |    0   |     0   | exists,up |
| 7  | control02 | 1096M |  135G |    0   |     0   |    4   |     0   | exists,up |
| 8  | control03 | 1788M |  117G |    0   |     0   |    0   |     0   | exists,up |
| 9  | control03 | 1952M |  117G |    0   |     0   |    0   |     0   | exists,up |
| 10 | control03 | 1096M |  135G |    0   |     0   |    0   |     0   | exists,up |
| 11 | control03 | 1096M |  135G |    0   |     0   |    0   |     0   | exists,up |
| 12 | compute01 | 1464M | 56.5G |    0   |     0   |    0   |     0   | exists,up |
| 13 | compute01 | 1403M | 56.6G |    0   |     0   |    0   |     0   | exists,up |
| 14 | compute01 | 1337M | 56.7G |    0   |     0   |    0   |     0   | exists,up |
| 15 | compute01 | 1428M | 56.6G |    0   |     0   |    0   |     0   | exists,up |
| 16 | compute01 | 1109M |  464G |    0   |     0   |    0   |     0   | exists,up |
| 17 | compute01 | 1096M |  464G |    0   |     0   |    0   |     0   | exists,up |
| 18 | storage01 | 1042M | 57.0G |    0   |     0   |    0   |     0   | exists,up |
| 19 | storage01 | 1042M | 57.0G |    0   |     0   |    0   |     0   | exists,up |
| 20 | storage01 | 1042M | 57.0G |    0   |     0   |    0   |     0   | exists,up |
| 21 | storage01 | 1042M | 57.0G |    0   |     0   |    0   |     0   | exists,up |
| 22 | storage01 | 1096M |  464G |    0   |     0   |    0   |     0   | exists,up |
| 23 | storage01 | 1112M |  464G |    0   |     0   |    0   |     0   | exists,up |
+----+-----------+-------+-------+--------+---------+--------+---------+-----------+

Remove Storage node from cluster​

Connect to controller​

Check the storage status​

Remove node​

Check the storage status​

Check repair make sure services are ok​

Backup storage01 policies​

Shutdown storage01​

Adding Storage Host​

Configuration​

Connect to controller​

Check & Repair services​

Connect to storage01​

Check the storage status​

Remove Storage node from cluster

Connect to controller

Check the storage status

Remove node

Check the storage status

Check repair make sure services are ok

Backup storage01 policies

Shutdown storage01

Adding Storage Host

Configuration

Connect to controller

Check & Repair services

Connect to storage01

Check the storage status