Ceph-to-Ceph migration for Openstack leveraging RBD mirroring
Jan 4, 2021 · 1859 words · 9 minute read
Introduction
In IT, we use the term migration when we move stuff from A to B, where stuff = data + metadata.
Translating this to Cloud Infrastructure, data is the virtual machine’s image/volume (blocks of data), metadata is the virtual machine’s attributes: cpu, memory, interfaces, ip addresses, ownership, etc. (Let me ignore now the Object Storage scenario where we have a similar approach to distinguish data/metadata - but the migration strategy/solution is different.)
To be more concrete, having an Openstack cloud with Ceph storage, the data/metadata duality can be described like this:
- data: the rbd images in the pools of the Ceph storage
- metadata: the mysql database of Openstack
In this post we discuss only the data migration step.
Overview
We have an Openstack cloud using a Ceph storage cluster; why would we want to migrate it to a different storage? One use case is that you want to deploy and maintain your storage in a different way in the future and the migration between the two configuration solutions is very difficult (we never say impossible in IT). For example, moving from a ceph-ansible based solution to a juju based solution fits into this category. Let’s summarize the steps:
- we have a cloud control plane connected to a storage cluster called ceph-src
- we create another - empty - storage cluster called ceph-dst
- we mirror each image of a pool from ceph-src to ceph-dst
- once it’s done, we switch over the cloud from ceph-src to ceph-dst
Openstack and Ceph
Before we start working, we have to understand how Openstack relates to Ceph. You can have an Openstack deployment without Ceph; in this case you have local storage on the compute nodes. This solution does not scale well. Using Openstack with Ceph provides great flexibility.
Openstack uses three Ceph pools by default:
nova
: to store ephemeral disksglance
: to store imagescinder-ceph
: to store volumes
Nova creates the ephemeral disks based on the setting called libvirt_image_type. If it’s value is rbd, images will be created on Ceph (this is a very simplified explanation). You can check the disk section of the virtual machine’s xml definition; the host name references the ceph monitor host(s); the name after the protocol definition references the pool/image for nova or cinder, respectively.
an ephemeral disk:
<source protocol='rbd' name='nova/523d0de7-016f-4ef0-ac89-386dd1ERA861_disk'>
<host name='10.33.11.251' port='6789'/>
a volume:
<source protocol='rbd' name='cinder-ceph/volume-b4e4e70e-42ba-4479-a15f-cd2db74a755a'>
<host name='10.33.11.251' port='6789'/>
It is recommended to use raw image format in glance because this way we can leverage Ceph’s COW (Copy-on-Write) capability:
- we store the image in glance as raw
- a protected (read-only) snapshot is created from the image
- new instances will be the clones of this snapshot
Major steps of the Proof of Concept
- create the source Ceph cluster
- configure Openstack to use this cluster
- create the destination Ceph cluster
- prevent any changes to the cloud
- no new instances/volumes are allowed to be created
- however, existing instances/volumes are still working
- configure the one-way mirroring between the source and destination cluster
- we could configure a two-way mirroring as well, but that would better fit into a DR solution
- we keep the cloud up & running so there is now downtime up to this point
- switch-over
- shut down the instances
- detach the volumes from the instances
- deactivate images
- set the destination cluster as primary
- stop mirroring between the two clusters
- disconnect the cloud from the source cluster and connect it to the destination cluster
- restore the cloud
- start the APIs
Configure the one-way mirroring
Let me reference here the official rbd mirroring documentation:
RBD images can be asynchronously mirrored between two Ceph clusters. This capability is available in two modes:
- Journal-based: This mode uses the RBD journaling image feature to ensure point-in-time, crash-consistent replication between clusters.
- Snapshot-based: This mode uses periodically scheduled or manually created RBD image mirror-snapshots to replicate crash-consistent RBD images between clusters.
Mirroring is configured on a per-pool basis within peer clusters and can be configured on a specific subset of images within the pool.
Depending on the desired needs for replication, RBD mirroring can be configured for either one- or two-way replication:
- One-way Replication: When data is only mirrored from a primary cluster to a secondary cluster, the rbd-mirror daemon runs only on the secondary cluster.
- Two-way Replication: When data is mirrored from primary images on one cluster to non-primary images on another cluster (and vice-versa), the rbd-mirror daemon runs on both clusters.
We will implement the journal-based one-way replication for each pool Openstack uses: nova, glance and cinder-ceph (it.
The main steps are the following:
- on the source cluster
- enable mirroring on the pools
- enable journaling on the images of the pools
- create a user/credential for the mirroring
- see how pools were created
- on the destination cluster
- create the pools on the destination cluster
- enable mirroring on the pools
- get the credentials from the source cluster
- create a user/credential for the mirroring
- install and configure the rbd-mirror daemon
- configure mirroring per pool
On the source cluster
Enable mirroring on the pools
From the man page of rbd: mirror pool enable [pool-name] mode
Enable RBD mirroring by default within a pool. The mirroring mode can either be pool or image. If configured in pool mode, all images in the pool with the journaling feature enabled are mirrored. If configured in image mode, mirroring needs to be explicitly enabled (by mirror image enable command) on each image.
We choose pool mode:
ceph-src:> for i in glance nova cinder-ceph; do rbd mirror pool enable $i pool; done
Enable journaling on the images of the pools
ceph-src:> for i in glance nova cinder-ceph; do for j in `rbd -p $i ls`; do rbd feature enable $i/$j journaling; done; done
Create a user/credential for the mirroring
ceph-src:> ceph auth get-or-create client.rbd-mirror-src mon 'profile rbd' osd 'profile rbd' -o /etc/ceph/ceph-src.client.rbd-mirror-src.keyring
On the destination cluster
Create the pools on the destination cluster
ceph-dst:> ceph osd pool create nova 32 32 replicated
ceph-dst:> ceph osd pool create glance 4 4 replicated
ceph-dst:> ceph osd pool create cinder-ceph 32 32 replicated
ceph-dst:> for i in nova glance cinder-ceph; do ceph osd pool application enable $i rbd; done
Enable mirroring on the pools
ceph-dst:> for i in glance nova cinder-ceph; do rbd mirror pool enable $i pool; done
get the credentials from the source cluster
We just need a minimal ceph.conf snippet and the credential:
ceph-dst:> cat /etc/ceph/ceph-src.conf
[global]
mon host = 10.33.11.251 10.33.21.251 10.33.31.251
ceph-dst:> cat /etc/ceph/ceph-src.client.rbd-mirror-src.keyring
[client.rbd-mirror-src]
key = AQAXBetfm/yZFxAA3YmDIgNXEjj1GNuhNXxx6A==
Check whether we can reach to source cluster properly:
ceph-dst:> ceph --cluster ceph-src -n client.rbd-mirror-src osd lspools
1 nova
2 glance
3 cinder-ceph
Create a user/credential for the mirroring
This will be used by the rbd-mirror daemon later:
ceph-dst:> ceph auth get-or-create client.rbd-mirror-dst mon 'profile rbd' osd 'profile rbd' -o /etc/ceph/ceph.client.rbd-mirror-dst.keyring
Install and configure the rbd-mirror daemon
The rbd-mirror daemon is responsible for pulling image updates from the remote peer cluster and applying them to the image within the local cluster.
ceph-dst:> apt install rbd-mirror
ceph-dst:> systemctl enable ceph-rbd-mirror@rbd-mirror-dst.service
ceph-dst:> systemctl start ceph-rbd-mirror@rbd-mirror-dst.service
ceph-dst:> systemctl status ceph-rbd-mirror@rbd-mirror-dst.service
Configure mirroring per pool
So far, we just prepared the mirroring - now it’s time to configure it really:
- pool: glance
ceph-dst:> rbd mirror pool peer add glance client.rbd-mirror-src@ceph-src
98db5fc6-fc72-4c13-a3d0-c41616a23983
- pool: nova
ceph-dst:> rbd mirror pool peer add nova client.rbd-mirror-src@ceph-src
15d732fd-f183-4ba8-850e-5303da9056a2
- pool: cinder-ceph
ceph-dst:> rbd mirror pool peer add cinder-ceph client.rbd-mirror-src@ceph-src
17bf9724-e481-4b7e-bd8a-78982e27ae8b
At this point, we have
- active mirroring between the source and the destination cluster
- Openstack APIs down to prevent changes to the cloud
- Openstack cloud instances/volumes are working; the cloud is still connected to the source cluster
Switch-over
Shut down the instances
This step is necessary because we have to force the recreation of the libvirt configuration for the ephemeral disk(s) since we will use a different set of ceph monitors
note: we just stop the instances - no need to delete / create them; basically, this is why we worked so hard so far!
openstack server stop vm0
openstack server stop vm1
openstack server stop vm2
Detach the volumes from the instances
Again, this step is necessary to force the recreation of the libvirt configuration for the volume(s):
openstack server remove volume vm0 vo0
openstack server remove volume vm1 vo1
openstack server remove volume vm2 vo2
Deactivate images
openstack image set --deactivate cirros
Demote-on-src, promote-on-dst: set the destination cluster as primary
- We make sure no active mirroring is happening
ceph-dst:> for i in glance nova cinder-ceph; do echo pool $i:; rbd mirror pool status $i; done
pool glance:
health: OK
images: 1 total
1 replaying
pool nova:
health: OK
images: 3 total
3 replaying
pool cinder-ceph:
health: OK
images: 3 total
3 replaying
- We demote/promote the pool and implicitly each image in that pool in one step
- We execute all the commands on the destination cluster since we have access to both clusters from there
ceph-dst:> rbd --cluster ceph-src -n client.rbd-mirror-src mirror pool demote glance
Demoted 1 mirrored images
ceph-dst:> rbd mirror pool promote glance
Promoted 1 mirrored images
ceph-dst:> rbd --cluster ceph-src -n client.rbd-mirror-src mirror pool demote nova
Demoted 3 mirrored images
ceph-dst:> rbd mirror pool promote nova
Promoted 3 mirrored images
ceph-dst:> rbd --cluster ceph-src -n client.rbd-mirror-src mirror pool demote cinder-ceph
Demoted 3 mirrored images
ceph-dst:> rbd mirror pool promote cinder-ceph
Promoted 3 mirrored images
Stop mirroring between the two clusters
Remove peers
- pool: glance
ceph-dst:> rbd mirror pool peer remove glance 98db5fc6-fc72-4c13-a3d0-c41616a23983
- pool: nova
ceph-dst:> rbd mirror pool peer remove nova 15d732fd-f183-4ba8-850e-5303da9056a2
- pool: cinder-ceph
ceph-dst:> rbd mirror pool peer remove cinder-ceph 17bf9724-e481-4b7e-bd8a-78982e27ae8b
Disable mirroring per pool
ceph-dst:> for i in glance nova cinder-ceph; do rbd mirror pool disable $i; done
Disable journaling
ceph-dst:> for i in glance nova cinder-ceph; do for j in `rbd -p $i ls`; do rbd feature disable $i/$j journaling; done; done
Stop and disable the rbd-mirror daemon
ceph-dst:> systemctl stop ceph-rbd-mirror@rbd-mirror-dst.service
ceph-dst:> systemctl disable ceph-rbd-mirror@rbd-mirror-dst.service
We disconnect the cloud from the source cluster and connect it to the destination cluster
This step is very exciting, however, it’s outside the scope of this post.
Restore the cloud
openstack image set --activate cirros
openstack server add volume vm0 vo0
openstack server add volume vm1 vo1
openstack server add volume vm2 vo2
openstack server start vm0
openstack server start vm1
openstack server start vm2
Check the xml definition of the instances; this is what we had with the source cluster:
...
<source protocol='rbd' name='nova/523d0de7-016f-4ef0-ac89-386dd1ERA861_disk'>
<host name='10.33.11.251' port='6789'/>
<host name='10.33.21.251' port='6789'/>
<host name='10.33.31.251' port='6789'/>
...
<source protocol='rbd' name='cinder-ceph/volume-b4e4e70e-42ba-4479-a15f-cd2db74a755a'>
<host name='10.33.11.251' port='6789'/>
<host name='10.33.21.251' port='6789'/>
<host name='10.33.31.251' port='6789'/>
...
This is what we have now with the destination cluster:
...
<source protocol='rbd' name='nova/523d0de7-016f-4ef0-ac89-386dd1ERA861_disk'>
<host name='10.33.10.41' port='6789'/>
<host name='10.33.10.42' port='6789'/>
<host name='10.33.10.43' port='6789'/>
...
<source protocol='rbd' name='cinder-ceph/volume-b4e4e70e-42ba-4479-a15f-cd2db74a755a'>
<host name='10.33.10.41' port='6789'/>
<host name='10.33.10.42' port='6789'/>
<host name='10.33.10.43' port='6789'/>
...
- The reference to the rbd image is the same for nova and cinder-ceph, respectively
- The reference to the ceph monitors is different since we replaced the storage
Start the APIs
Now you can enable access to the cloud and allow tenants to change things again.
Closure
We focused on the rbd mirroring step and we had to skip many other steps obviously. You can read a longer version of this proof of concept with working examples here.
References
- https://docs.ceph.com/en/latest/rbd/rbd-mirroring/
- https://docs.ceph.com/en/latest/rbd/libvirt/
- https://docs.ceph.com/en/latest/rbd/qemu-rbd/
- https://docs.openstack.org/nova/latest/admin/configuration/hypervisor-kvm.html
- https://docs.openstack.org/project-deploy-guide/charm-deployment-guide/latest/app-ceph-rbd-mirror.html
- https://cloud.garr.it/support/kb/ceph/ceph-enabling-rbd-mirror/
- https://pve.proxmox.com/wiki/Ceph_RBD_Mirroring