Skip to main content
Version: 2.2

Storage node Replacement Guide

Remove Storage node from cluster

Make sure your storage data replication either 2 or 3.

Connect to controller

$ ssh [email protected]
Warning: Permanently added '192.168.1.x' (ECDSA) to the list of known hosts.
Password:

Check the storage status

control01> storage
control01:storage> status
cluster:
id: c6e64c49-09cf-463b-9d1c-b6645b4b3b85
health: HEALTH_OK

services:
mon: 3 daemons, quorum control01,control02,control03
mgr: control01(active), standbys: control02, control03
mds: cephfs-1/1/1 up {0=control01=up:active}, 2 up:standby
osd: 24 osds: 24 up, 24 in
rgw: 3 daemons active

data:
pools: 23 pools, 1837 pgs
objects: 10.50k objects, 12.7GiB
usage: 31.3GiB used, 3.74TiB / 3.77TiB avail
pgs: 1837 active+clean

io:
client: 15.5KiB/s rd, 0B/s wr, 15op/s rd, 10op/s wr

+----+-----------+-------+-------+--------+---------+--------+---------+-----------+
| id | host | used | avail | wr ops | wr data | rd ops | rd data | state |
+----+-----------+-------+-------+--------+---------+--------+---------+-----------+
| 0 | control01 | 2063M | 117G | 0 | 0 | 2 | 61 | exists,up |
| 1 | control01 | 2020M | 117G | 0 | 0 | 1 | 36 | exists,up |
| 2 | control01 | 1089M | 135G | 0 | 0 | 0 | 0 | exists,up |
| 3 | control01 | 1081M | 135G | 0 | 0 | 0 | 0 | exists,up |
| 4 | control02 | 1656M | 117G | 0 | 0 | 0 | 0 | exists,up |
| 5 | control02 | 2073M | 116G | 0 | 0 | 0 | 0 | exists,up |
| 6 | control02 | 1089M | 135G | 0 | 0 | 0 | 0 | exists,up |
| 7 | control02 | 1089M | 135G | 0 | 0 | 4 | 0 | exists,up |
| 8 | control03 | 1781M | 117G | 0 | 0 | 0 | 0 | exists,up |
| 9 | control03 | 1961M | 117G | 0 | 0 | 7 | 157 | exists,up |
| 10 | control03 | 1089M | 135G | 0 | 0 | 0 | 0 | exists,up |
| 11 | control03 | 1089M | 135G | 0 | 0 | 0 | 0 | exists,up |
| 12 | compute01 | 1462M | 56.5G | 0 | 0 | 0 | 0 | exists,up |
| 13 | compute01 | 1400M | 56.6G | 0 | 0 | 0 | 0 | exists,up |
| 14 | compute01 | 1334M | 56.7G | 0 | 0 | 0 | 6 | exists,up |
| 15 | compute01 | 1426M | 56.6G | 0 | 0 | 0 | 0 | exists,up |
| 16 | compute01 | 1101M | 464G | 0 | 0 | 0 | 19 | exists,up |
| 17 | compute01 | 1089M | 464G | 0 | 0 | 0 | 0 | exists,up |
| 18 | storage01 | 1040M | 57.0G | 0 | 0 | 0 | 0 | exists,up |
| 19 | storage01 | 1040M | 57.0G | 0 | 0 | 0 | 0 | exists,up |
| 20 | storage01 | 1040M | 57.0G | 0 | 0 | 0 | 0 | exists,up |
| 21 | storage01 | 1048M | 57.0G | 0 | 0 | 0 | 0 | exists,up |
| 22 | storage01 | 1081M | 464G | 0 | 0 | 0 | 0 | exists,up |
| 23 | storage01 | 1105M | 464G | 0 | 0 | 0 | 0 | exists,up |
+----+-----------+-------+-------+--------+---------+--------+---------+-----------+

Remove node

Removing storage01 from the cluster

control01:cluster> remove_node
1: compute01
2: storage01
3: control01
4: control03
5: control02
Enter index: 2
this command is only applicable for compute or storage nodes
make sure its running instances have been properly terminated or migrated
shutdown the target host before proceeding
Enter 'YES' to confirm: YES
control01:cluster>

Check the storage status

storage01 node has been removed from the cluster

control01> storage
control01:storage> status
cluster:
id: c6e64c49-09cf-463b-9d1c-b6645b4b3b85
health: HEALTH_WARN
Reduced data availability: 2 pgs inactive
Degraded data redundancy: 139/21222 objects degraded (0.655%), 10 pgs degraded

services:
mon: 3 daemons, quorum control01,control02,control03
mgr: control01(active), standbys: control02, control03
mds: cephfs-1/1/1 up {0=control01=up:active}, 2 up:standby
osd: 18 osds: 18 up, 18 in; 510 remapped pgs
rgw: 3 daemons active

data:
pools: 23 pools, 1837 pgs
objects: 10.50k objects, 12.7GiB
usage: 25.4GiB used, 2.61TiB / 2.63TiB avail
pgs: 10.670% pgs unknown
0.435% pgs not active
139/21222 objects degraded (0.655%)
1406 active+clean
214 active+clean+remapped
196 unknown
9 active+recovery_wait+degraded
5 activating+remapped
3 activating
3 active+undersized
1 active+undersized+degraded+remapped+backfilling

io:
client: 13.1KiB/s rd, 0B/s wr, 13op/s rd, 9op/s wr
recovery: 1.11MiB/s, 4objects/s

+----+-----------+-------+-------+--------+---------+--------+---------+-----------+
| id | host | used | avail | wr ops | wr data | rd ops | rd data | state |
+----+-----------+-------+-------+--------+---------+--------+---------+-----------+
| 0 | control01 | 2060M | 117G | 0 | 0 | 0 | 0 | exists,up |
| 1 | control01 | 2025M | 117G | 0 | 0 | 0 | 0 | exists,up |
| 2 | control01 | 1093M | 135G | 0 | 0 | 0 | 0 | exists,up |
| 3 | control01 | 1086M | 135G | 10 | 0 | 5 | 69 | exists,up |
| 4 | control02 | 1668M | 117G | 0 | 0 | 0 | 0 | exists,up |
| 5 | control02 | 2086M | 116G | 0 | 0 | 0 | 0 | exists,up |
| 6 | control02 | 1093M | 135G | 0 | 0 | 0 | 0 | exists,up |
| 7 | control02 | 1094M | 135G | 0 | 0 | 4 | 0 | exists,up |
| 8 | control03 | 1785M | 117G | 0 | 0 | 0 | 0 | exists,up |
| 9 | control03 | 1957M | 117G | 0 | 0 | 0 | 0 | exists,up |
| 10 | control03 | 1093M | 135G | 0 | 0 | 0 | 0 | exists,up |
| 11 | control03 | 1094M | 135G | 0 | 0 | 0 | 0 | exists,up |
| 12 | compute01 | 1463M | 56.5G | 0 | 0 | 0 | 0 | exists,up |
| 13 | compute01 | 1402M | 56.6G | 0 | 0 | 0 | 0 | exists,up |
| 14 | compute01 | 1336M | 56.7G | 0 | 0 | 0 | 0 | exists,up |
| 15 | compute01 | 1427M | 56.6G | 0 | 0 | 0 | 0 | exists,up |
| 16 | compute01 | 1106M | 464G | 0 | 0 | 0 | 0 | exists,up |
| 17 | compute01 | 1094M | 464G | 0 | 0 | 0 | 0 | exists,up |
+----+-----------+-------+-------+--------+---------+--------+---------+-----------+

Check repair make sure services are ok

control03> cluster
control03:cluster> check_repair
Service Status Report
ClusterLink ok [ link(v) clock(v) dns(v) ]
ClusterSys ok [ bootstrap(v) license(v) ]
ClusterSettings ok [ etcd(v) ]
HaCluster FIXING [ hacluster(3) ]
ok [ hacluster(f) ]
MsgQueue ok [ rabbitmq(v) ]
IaasDb ok [ mysql(v) ]
VirtualIp ok [ vip(v) haproxy_ha(v) ]
Storage ok [ ceph(v) ceph_mon(v) ceph_mgr(v) ceph_mds(v) ceph_osd(v) ceph_rgw(v) rbd_target(v) ]
ApiService ok [ haproxy(v) httpd(v) lmi(v) memcache(v) ]
SingleSignOn ok [ keycloak(v) ]
Compute ok [ nova(v) ]
Baremetal ok [ ironic(v) ]
Network ok [ neutron(v) ]
Image ok [ glance(v) ]
BlockStor ok [ cinder(v) ]
FileStor ok [ manila(v) ]
ObjectStor ok [ swift(v) ]
Orchestration ok [ heat(v) ]
LBaaS ok [ octavia(v) ]
DNSaaS ok [ designate(v) ]
K8SaaS ok [ k3s(v) rancher(v) ]
InstanceHa ok [ masakari(v) ]
DisasterRecovery ok [ freezer(v) ]
BusinessLogic ok [ mistral(v) murano(v) cloudkitty(v) senlin(v) watcher(v) ]
ApiManager ok [ tyk(v) redis(v) mongodb(v) ]
DataPipe ok [ zookeeper(v) kafka(v) ]
Metrics ok [ monasca(v) telegraf(v) grafana(v) ]
LogAnalytics ok [ filebeat(v) auditbeat(v) logstash(v) es(v) kibana(v) ]
Notifications ok [ influxdb(v) kapacitor(v) ]
control03:cluster>

Backup storage01 policies

From your local pc terminal

$ scp -r root@storage01_IPADDRESS:/etc/policies Downloads/storage01_policy

Shutdown storage01

$ ssh [email protected]
Warning: Permanently added '192.168.1.121' (ECDSA) to the list of known hosts.
Password:
Welcome to the Cube Appliance
Enter "help" for a list of available commands
storage01> shutdown
Enter 'YES' to confirm: YES
Connection to 192.168.1.121 closed by remote host.
Connection to 192.168.1.121 closed.

Adding Storage Host

Prepare a new node with CubeOS installed

Configuration

Reconfigure a new storage01 node by following any options of the list below:

Connect to controller

$ ssh [email protected]
Warning: Permanently added '192.168.1.x' (ECDSA) to the list of known hosts.
Password:

Check & Repair services

control01> cluster
control01:cluster> check_repair
Service Status Report
ClusterLink ok [ link(v) clock(v) dns(v) ]
ClusterSys ok [ bootstrap(v) license(v) ]
ClusterSettings ok [ etcd(v) ]
HaCluster FIXING [ hacluster(3) ]
ok [ hacluster(f) ]
MsgQueue ok [ rabbitmq(v) ]
IaasDb ok [ mysql(v) ]
VirtualIp ok [ vip(v) haproxy_ha(v) ]
Storage ok [ ceph(v) ceph_mon(v) ceph_mgr(v) ceph_mds(v) ceph_osd(v) ceph_rgw(v) rbd_target(v) ]
ApiService ok [ haproxy(v) httpd(v) lmi(v) memcache(v) ]
SingleSignOn ok [ keycloak(v) ]
Compute ok [ nova(v) ]
Baremetal ok [ ironic(v) ]
Network ok [ neutron(v) ]
Image ok [ glance(v) ]
BlockStor ok [ cinder(v) ]
FileStor ok [ manila(v) ]
ObjectStor ok [ swift(v) ]
Orchestration ok [ heat(v) ]
LBaaS ok [ octavia(v) ]
DNSaaS ok [ designate(v) ]
K8SaaS ok [ k3s(v) rancher(v) ]
InstanceHa ok [ masakari(v) ]
DisasterRecovery ok [ freezer(v) ]
BusinessLogic ok [ mistral(v) murano(v) cloudkitty(v) senlin(v) watcher(v) ]
ApiManager ok [ tyk(v) redis(v) mongodb(v) ]
DataPipe ok [ zookeeper(v) kafka(v) ]
Metrics ok [ monasca(v) telegraf(v) grafana(v) ]
LogAnalytics ok [ filebeat(v) auditbeat(v) logstash(v) es(v) kibana(v) ]
Notifications ok [ influxdb(v) kapacitor(v) ]
control01:cluster>

Connect to storage01

$ ssh [email protected]
Warning: Permanently added '192.168.1.121' (ECDSA) to the list of known hosts.
Password:

Check the storage status

storage01> storage
storage01:storage> status
cluster:
id: c6e64c49-09cf-463b-9d1c-b6645b4b3b85
health: HEALTH_OK

services:
mon: 3 daemons, quorum control01,control02,control03
mgr: control01(active), standbys: control02, control03
mds: cephfs-1/1/1 up {0=control01=up:active}, 2 up:standby
osd: 24 osds: 24 up, 24 in
rgw: 3 daemons active

data:
pools: 23 pools, 1837 pgs
objects: 10.50k objects, 12.7GiB
usage: 31.6GiB used, 3.74TiB / 3.77TiB avail
pgs: 1837 active+clean

io:
client: 42.3KiB/s rd, 49op/s rd, 0op/s wr

+----+-----------+-------+-------+--------+---------+--------+---------+-----------+
| id | host | used | avail | wr ops | wr data | rd ops | rd data | state |
+----+-----------+-------+-------+--------+---------+--------+---------+-----------+
| 0 | control01 | 2063M | 117G | 0 | 0 | 0 | 0 | exists,up |
| 1 | control01 | 2036M | 117G | 0 | 0 | 0 | 0 | exists,up |
| 2 | control01 | 1088M | 135G | 0 | 0 | 0 | 0 | exists,up |
| 3 | control01 | 1088M | 135G | 0 | 0 | 0 | 0 | exists,up |
| 4 | control02 | 1663M | 117G | 0 | 0 | 0 | 0 | exists,up |
| 5 | control02 | 2080M | 116G | 0 | 0 | 0 | 0 | exists,up |
| 6 | control02 | 1096M | 135G | 0 | 0 | 0 | 0 | exists,up |
| 7 | control02 | 1096M | 135G | 0 | 0 | 4 | 0 | exists,up |
| 8 | control03 | 1788M | 117G | 0 | 0 | 0 | 0 | exists,up |
| 9 | control03 | 1952M | 117G | 0 | 0 | 0 | 0 | exists,up |
| 10 | control03 | 1096M | 135G | 0 | 0 | 0 | 0 | exists,up |
| 11 | control03 | 1096M | 135G | 0 | 0 | 0 | 0 | exists,up |
| 12 | compute01 | 1464M | 56.5G | 0 | 0 | 0 | 0 | exists,up |
| 13 | compute01 | 1403M | 56.6G | 0 | 0 | 0 | 0 | exists,up |
| 14 | compute01 | 1337M | 56.7G | 0 | 0 | 0 | 0 | exists,up |
| 15 | compute01 | 1428M | 56.6G | 0 | 0 | 0 | 0 | exists,up |
| 16 | compute01 | 1109M | 464G | 0 | 0 | 0 | 0 | exists,up |
| 17 | compute01 | 1096M | 464G | 0 | 0 | 0 | 0 | exists,up |
| 18 | storage01 | 1042M | 57.0G | 0 | 0 | 0 | 0 | exists,up |
| 19 | storage01 | 1042M | 57.0G | 0 | 0 | 0 | 0 | exists,up |
| 20 | storage01 | 1042M | 57.0G | 0 | 0 | 0 | 0 | exists,up |
| 21 | storage01 | 1042M | 57.0G | 0 | 0 | 0 | 0 | exists,up |
| 22 | storage01 | 1096M | 464G | 0 | 0 | 0 | 0 | exists,up |
| 23 | storage01 | 1112M | 464G | 0 | 0 | 0 | 0 | exists,up |
+----+-----------+-------+-------+--------+---------+--------+---------+-----------+