Storage node Replacement Guide
#
Remove Storage node from clusterMake sure your storage data replication either 2 or 3.
#
Connect to controller$ ssh [email protected]Warning: Permanently added '192.168.1.x' (ECDSA) to the list of known hosts.Password:
#
Check the storage statuscontrol01> storagecontrol01:storage> status cluster: id: c6e64c49-09cf-463b-9d1c-b6645b4b3b85 health: HEALTH_OK
services: mon: 3 daemons, quorum control01,control02,control03 mgr: control01(active), standbys: control02, control03 mds: cephfs-1/1/1 up {0=control01=up:active}, 2 up:standby osd: 24 osds: 24 up, 24 in rgw: 3 daemons active
data: pools: 23 pools, 1837 pgs objects: 10.50k objects, 12.7GiB usage: 31.3GiB used, 3.74TiB / 3.77TiB avail pgs: 1837 active+clean
io: client: 15.5KiB/s rd, 0B/s wr, 15op/s rd, 10op/s wr
+----+-----------+-------+-------+--------+---------+--------+---------+-----------+| id | host | used | avail | wr ops | wr data | rd ops | rd data | state |+----+-----------+-------+-------+--------+---------+--------+---------+-----------+| 0 | control01 | 2063M | 117G | 0 | 0 | 2 | 61 | exists,up || 1 | control01 | 2020M | 117G | 0 | 0 | 1 | 36 | exists,up || 2 | control01 | 1089M | 135G | 0 | 0 | 0 | 0 | exists,up || 3 | control01 | 1081M | 135G | 0 | 0 | 0 | 0 | exists,up || 4 | control02 | 1656M | 117G | 0 | 0 | 0 | 0 | exists,up || 5 | control02 | 2073M | 116G | 0 | 0 | 0 | 0 | exists,up || 6 | control02 | 1089M | 135G | 0 | 0 | 0 | 0 | exists,up || 7 | control02 | 1089M | 135G | 0 | 0 | 4 | 0 | exists,up || 8 | control03 | 1781M | 117G | 0 | 0 | 0 | 0 | exists,up || 9 | control03 | 1961M | 117G | 0 | 0 | 7 | 157 | exists,up || 10 | control03 | 1089M | 135G | 0 | 0 | 0 | 0 | exists,up || 11 | control03 | 1089M | 135G | 0 | 0 | 0 | 0 | exists,up || 12 | compute01 | 1462M | 56.5G | 0 | 0 | 0 | 0 | exists,up || 13 | compute01 | 1400M | 56.6G | 0 | 0 | 0 | 0 | exists,up || 14 | compute01 | 1334M | 56.7G | 0 | 0 | 0 | 6 | exists,up || 15 | compute01 | 1426M | 56.6G | 0 | 0 | 0 | 0 | exists,up || 16 | compute01 | 1101M | 464G | 0 | 0 | 0 | 19 | exists,up || 17 | compute01 | 1089M | 464G | 0 | 0 | 0 | 0 | exists,up || 18 | storage01 | 1040M | 57.0G | 0 | 0 | 0 | 0 | exists,up || 19 | storage01 | 1040M | 57.0G | 0 | 0 | 0 | 0 | exists,up || 20 | storage01 | 1040M | 57.0G | 0 | 0 | 0 | 0 | exists,up || 21 | storage01 | 1048M | 57.0G | 0 | 0 | 0 | 0 | exists,up || 22 | storage01 | 1081M | 464G | 0 | 0 | 0 | 0 | exists,up || 23 | storage01 | 1105M | 464G | 0 | 0 | 0 | 0 | exists,up |+----+-----------+-------+-------+--------+---------+--------+---------+-----------+
#
Remove nodeRemoving storage01 from the cluster
control01:cluster> remove_node1: compute012: storage013: control014: control035: control02Enter index: 2this command is only applicable for compute or storage nodesmake sure its running instances have been properly terminated or migratedshutdown the target host before proceedingEnter 'YES' to confirm: YEScontrol01:cluster>
#
Check the storage statusstorage01 node has been removed from the cluster
control01> storagecontrol01:storage> status cluster: id: c6e64c49-09cf-463b-9d1c-b6645b4b3b85 health: HEALTH_WARN Reduced data availability: 2 pgs inactive Degraded data redundancy: 139/21222 objects degraded (0.655%), 10 pgs degraded
services: mon: 3 daemons, quorum control01,control02,control03 mgr: control01(active), standbys: control02, control03 mds: cephfs-1/1/1 up {0=control01=up:active}, 2 up:standby osd: 18 osds: 18 up, 18 in; 510 remapped pgs rgw: 3 daemons active
data: pools: 23 pools, 1837 pgs objects: 10.50k objects, 12.7GiB usage: 25.4GiB used, 2.61TiB / 2.63TiB avail pgs: 10.670% pgs unknown 0.435% pgs not active 139/21222 objects degraded (0.655%) 1406 active+clean 214 active+clean+remapped 196 unknown 9 active+recovery_wait+degraded 5 activating+remapped 3 activating 3 active+undersized 1 active+undersized+degraded+remapped+backfilling
io: client: 13.1KiB/s rd, 0B/s wr, 13op/s rd, 9op/s wr recovery: 1.11MiB/s, 4objects/s
+----+-----------+-------+-------+--------+---------+--------+---------+-----------+| id | host | used | avail | wr ops | wr data | rd ops | rd data | state |+----+-----------+-------+-------+--------+---------+--------+---------+-----------+| 0 | control01 | 2060M | 117G | 0 | 0 | 0 | 0 | exists,up || 1 | control01 | 2025M | 117G | 0 | 0 | 0 | 0 | exists,up || 2 | control01 | 1093M | 135G | 0 | 0 | 0 | 0 | exists,up || 3 | control01 | 1086M | 135G | 10 | 0 | 5 | 69 | exists,up || 4 | control02 | 1668M | 117G | 0 | 0 | 0 | 0 | exists,up || 5 | control02 | 2086M | 116G | 0 | 0 | 0 | 0 | exists,up || 6 | control02 | 1093M | 135G | 0 | 0 | 0 | 0 | exists,up || 7 | control02 | 1094M | 135G | 0 | 0 | 4 | 0 | exists,up || 8 | control03 | 1785M | 117G | 0 | 0 | 0 | 0 | exists,up || 9 | control03 | 1957M | 117G | 0 | 0 | 0 | 0 | exists,up || 10 | control03 | 1093M | 135G | 0 | 0 | 0 | 0 | exists,up || 11 | control03 | 1094M | 135G | 0 | 0 | 0 | 0 | exists,up || 12 | compute01 | 1463M | 56.5G | 0 | 0 | 0 | 0 | exists,up || 13 | compute01 | 1402M | 56.6G | 0 | 0 | 0 | 0 | exists,up || 14 | compute01 | 1336M | 56.7G | 0 | 0 | 0 | 0 | exists,up || 15 | compute01 | 1427M | 56.6G | 0 | 0 | 0 | 0 | exists,up || 16 | compute01 | 1106M | 464G | 0 | 0 | 0 | 0 | exists,up || 17 | compute01 | 1094M | 464G | 0 | 0 | 0 | 0 | exists,up |+----+-----------+-------+-------+--------+---------+--------+---------+-----------+
#
Check repair make sure services are okcontrol01> clustercontrol01:cluster> check_repair Service Status Report ClusterLink ok [ link(v) ] ClusterSettings ok [ etcd(v) ] HaCluster ok [ hacluster(v) ] MsgQueue FIXING [ rabbitmq(2) ]Creating user "openstack" ...Setting permissions for user "openstack" in vhost "/" ... ok [ rabbitmq(f) ] IaasDb ok [ mysql(v) ] VirtualIp ok [ vip(v) haproxy_ha(v) ] Storage ok [ ceph(v) ceph_mon(v) ceph_mgr(v) ceph_mds(v) ceph_osd(v) ceph_rgw(v) ] ApiService ok [ haproxy(v) apache2(v) lmi(v) memcache(v) ] Compute ok [ nova(v) ] Network ok [ neutron(v) ] Image ok [ glance(v) ] BlockStor ok [ cinder(v) ] FileStor ok [ manila(v) ] ObjectStor ok [ swift(v) ] Orchestration ok [ heat(v) ] LBaaS ok [ octavia(v) ] DNSaaS ok [ designate(v) ] InstanceHa ok [ masakari(v) ] DisasterRecovery ok [ freezer(v) es235(v) ] DataPipe ok [ zookeeper(v) kafka(v) ] Metrics ok [ ceilometer(v) monasca(v) telegraf(v) grafana(v) ] LogAnalytics ok [ filebeat(v) auditbeat(v) logstash(v) es(v) kibana(v) ] Notifications ok [ influxdb(v) kapacitor(v) ]control01:cluster>
#
Backup storage01 policiesFrom your local pc terminal
$ scp -r root@storage01_IPADDRESS:/etc/policies Downloads/storage01_policy
#
Shutdown storage01$ ssh [email protected]Warning: Permanently added '192.168.1.121' (ECDSA) to the list of known hosts.Password:Welcome to the Cube ApplianceEnter "help" for a list of available commandsstorage01> shutdownEnter 'YES' to confirm: YESConnection to 192.168.1.121 closed by remote host.Connection to 192.168.1.121 closed.
#
Adding Storage HostPrepare a new node with CubeOS installed
#
ConfigurationReconfigure a new storage01 node by following any options of the list below:
#
Connect to controller$ ssh [email protected]Warning: Permanently added '192.168.1.x' (ECDSA) to the list of known hosts.Password:
#
Check & Repair servicesWelcome to the Cube ApplianceEnter "help" for a list of available commandscontrol01> clustercontrol01:cluster> check_repair Service Status Report ClusterLink ok [ link(v) ] ClusterSettings ok [ etcd(v) ] HaCluster ok [ hacluster(v) ] MsgQueue ok [ rabbitmq(v) ] IaasDb ok [ mysql(v) ] VirtualIp ok [ vip(v) haproxy_ha(v) ] Storage ok [ ceph(v) ceph_mon(v) ceph_mgr(v) ceph_mds(v) ceph_osd(v) ceph_rgw(v) ] ApiService ok [ haproxy(v) apache2(v) lmi(v) memcache(v) ] Compute ok [ nova(v) ] Network ok [ neutron(v) ] Image ok [ glance(v) ] BlockStor ok [ cinder(v) ] FileStor ok [ manila(v) ] ObjectStor ok [ swift(v) ] Orchestration ok [ heat(v) ] LBaaS ok [ octavia(v) ] DNSaaS ok [ designate(v) ] InstanceHa ok [ masakari(v) ] DisasterRecovery ok [ freezer(v) es235(v) ] DataPipe ok [ zookeeper(v) kafka(v) ] Metrics ok [ ceilometer(v) monasca(v) telegraf(v) grafana(v) ] LogAnalytics ok [ filebeat(v) auditbeat(v) logstash(v) es(v) kibana(v) ] Notifications ok [ influxdb(v) kapacitor(v) ]control03:cluster>
#
Connect to storage01$ ssh [email protected]Warning: Permanently added '192.168.1.121' (ECDSA) to the list of known hosts.Password:
#
Check the storage statusstorage01> storagestorage01:storage> status cluster: id: c6e64c49-09cf-463b-9d1c-b6645b4b3b85 health: HEALTH_OK
services: mon: 3 daemons, quorum control01,control02,control03 mgr: control01(active), standbys: control02, control03 mds: cephfs-1/1/1 up {0=control01=up:active}, 2 up:standby osd: 24 osds: 24 up, 24 in rgw: 3 daemons active
data: pools: 23 pools, 1837 pgs objects: 10.50k objects, 12.7GiB usage: 31.6GiB used, 3.74TiB / 3.77TiB avail pgs: 1837 active+clean
io: client: 42.3KiB/s rd, 49op/s rd, 0op/s wr
+----+-----------+-------+-------+--------+---------+--------+---------+-----------+| id | host | used | avail | wr ops | wr data | rd ops | rd data | state |+----+-----------+-------+-------+--------+---------+--------+---------+-----------+| 0 | control01 | 2063M | 117G | 0 | 0 | 0 | 0 | exists,up || 1 | control01 | 2036M | 117G | 0 | 0 | 0 | 0 | exists,up || 2 | control01 | 1088M | 135G | 0 | 0 | 0 | 0 | exists,up || 3 | control01 | 1088M | 135G | 0 | 0 | 0 | 0 | exists,up || 4 | control02 | 1663M | 117G | 0 | 0 | 0 | 0 | exists,up || 5 | control02 | 2080M | 116G | 0 | 0 | 0 | 0 | exists,up || 6 | control02 | 1096M | 135G | 0 | 0 | 0 | 0 | exists,up || 7 | control02 | 1096M | 135G | 0 | 0 | 4 | 0 | exists,up || 8 | control03 | 1788M | 117G | 0 | 0 | 0 | 0 | exists,up || 9 | control03 | 1952M | 117G | 0 | 0 | 0 | 0 | exists,up || 10 | control03 | 1096M | 135G | 0 | 0 | 0 | 0 | exists,up || 11 | control03 | 1096M | 135G | 0 | 0 | 0 | 0 | exists,up || 12 | compute01 | 1464M | 56.5G | 0 | 0 | 0 | 0 | exists,up || 13 | compute01 | 1403M | 56.6G | 0 | 0 | 0 | 0 | exists,up || 14 | compute01 | 1337M | 56.7G | 0 | 0 | 0 | 0 | exists,up || 15 | compute01 | 1428M | 56.6G | 0 | 0 | 0 | 0 | exists,up || 16 | compute01 | 1109M | 464G | 0 | 0 | 0 | 0 | exists,up || 17 | compute01 | 1096M | 464G | 0 | 0 | 0 | 0 | exists,up || 18 | storage01 | 1042M | 57.0G | 0 | 0 | 0 | 0 | exists,up || 19 | storage01 | 1042M | 57.0G | 0 | 0 | 0 | 0 | exists,up || 20 | storage01 | 1042M | 57.0G | 0 | 0 | 0 | 0 | exists,up || 21 | storage01 | 1042M | 57.0G | 0 | 0 | 0 | 0 | exists,up || 22 | storage01 | 1096M | 464G | 0 | 0 | 0 | 0 | exists,up || 23 | storage01 | 1112M | 464G | 0 | 0 | 0 | 0 | exists,up |+----+-----------+-------+-------+--------+---------+--------+---------+-----------+