Ceph-to-Ceph migration for Openstack leveraging RBD mirroring

Jan 4, 2021 · 1859 words · 9 minute read openstack ceph juju

Introduction

In IT, we use the term migration when we move stuff from A to B, where stuff = data + metadata.

Translating this to Cloud Infrastructure, data is the virtual machine’s image/volume (blocks of data), metadata is the virtual machine’s attributes: cpu, memory, interfaces, ip addresses, ownership, etc. (Let me ignore now the Object Storage scenario where we have a similar approach to distinguish data/metadata - but the migration strategy/solution is different.)

To be more concrete, having an Openstack cloud with Ceph storage, the data/metadata duality can be described like this:

  • data: the rbd images in the pools of the Ceph storage
  • metadata: the mysql database of Openstack

In this post we discuss only the data migration step.

Overview

We have an Openstack cloud using a Ceph storage cluster; why would we want to migrate it to a different storage? One use case is that you want to deploy and maintain your storage in a different way in the future and the migration between the two configuration solutions is very difficult (we never say impossible in IT). For example, moving from a ceph-ansible based solution to a juju based solution fits into this category. Let’s summarize the steps:

  • we have a cloud control plane connected to a storage cluster called ceph-src
  • we create another - empty - storage cluster called ceph-dst
  • we mirror each image of a pool from ceph-src to ceph-dst
  • once it’s done, we switch over the cloud from ceph-src to ceph-dst

Openstack and Ceph

Before we start working, we have to understand how Openstack relates to Ceph. You can have an Openstack deployment without Ceph; in this case you have local storage on the compute nodes. This solution does not scale well. Using Openstack with Ceph provides great flexibility.

Openstack uses three Ceph pools by default:

  • nova: to store ephemeral disks
  • glance: to store images
  • cinder-ceph: to store volumes

Nova creates the ephemeral disks based on the setting called libvirt_image_type. If it’s value is rbd, images will be created on Ceph (this is a very simplified explanation). You can check the disk section of the virtual machine’s xml definition; the host name references the ceph monitor host(s); the name after the protocol definition references the pool/image for nova or cinder, respectively.

an ephemeral disk:

<source protocol='rbd' name='nova/523d0de7-016f-4ef0-ac89-386dd1ERA861_disk'>
 <host name='10.33.11.251' port='6789'/>

a volume:

<source protocol='rbd' name='cinder-ceph/volume-b4e4e70e-42ba-4479-a15f-cd2db74a755a'>
  <host name='10.33.11.251' port='6789'/>

It is recommended to use raw image format in glance because this way we can leverage Ceph’s COW (Copy-on-Write) capability:

  • we store the image in glance as raw
  • a protected (read-only) snapshot is created from the image
  • new instances will be the clones of this snapshot

Major steps of the Proof of Concept

  • create the source Ceph cluster
  • configure Openstack to use this cluster
  • create the destination Ceph cluster
  • prevent any changes to the cloud
    • no new instances/volumes are allowed to be created
    • however, existing instances/volumes are still working
  • configure the one-way mirroring between the source and destination cluster
    • we could configure a two-way mirroring as well, but that would better fit into a DR solution
    • we keep the cloud up & running so there is now downtime up to this point
  • switch-over
    • shut down the instances
    • detach the volumes from the instances
    • deactivate images
    • set the destination cluster as primary
    • stop mirroring between the two clusters
    • disconnect the cloud from the source cluster and connect it to the destination cluster
    • restore the cloud
    • start the APIs

Configure the one-way mirroring

Let me reference here the official rbd mirroring documentation:

RBD images can be asynchronously mirrored between two Ceph clusters. This capability is available in two modes:

  • Journal-based: This mode uses the RBD journaling image feature to ensure point-in-time, crash-consistent replication between clusters.
  • Snapshot-based: This mode uses periodically scheduled or manually created RBD image mirror-snapshots to replicate crash-consistent RBD images between clusters.

Mirroring is configured on a per-pool basis within peer clusters and can be configured on a specific subset of images within the pool.

Depending on the desired needs for replication, RBD mirroring can be configured for either one- or two-way replication:

  • One-way Replication: When data is only mirrored from a primary cluster to a secondary cluster, the rbd-mirror daemon runs only on the secondary cluster.
  • Two-way Replication: When data is mirrored from primary images on one cluster to non-primary images on another cluster (and vice-versa), the rbd-mirror daemon runs on both clusters.

We will implement the journal-based one-way replication for each pool Openstack uses: nova, glance and cinder-ceph (it.

The main steps are the following:

  • on the source cluster
    • enable mirroring on the pools
    • enable journaling on the images of the pools
    • create a user/credential for the mirroring
    • see how pools were created
  • on the destination cluster
    • create the pools on the destination cluster
    • enable mirroring on the pools
    • get the credentials from the source cluster
    • create a user/credential for the mirroring
    • install and configure the rbd-mirror daemon
    • configure mirroring per pool

On the source cluster

Enable mirroring on the pools

From the man page of rbd: mirror pool enable [pool-name] mode

Enable RBD mirroring by default within a pool. The mirroring mode can either be pool or image. If configured in pool mode, all images in the pool with the journaling feature enabled are mirrored. If configured in image mode, mirroring needs to be explicitly enabled (by mirror image enable command) on each image.

We choose pool mode:

ceph-src:> for i in glance nova cinder-ceph; do rbd mirror pool enable $i pool; done

Enable journaling on the images of the pools

ceph-src:> for i in glance nova cinder-ceph; do for j in `rbd -p $i ls`; do rbd feature enable $i/$j journaling; done; done

Create a user/credential for the mirroring

ceph-src:> ceph auth get-or-create client.rbd-mirror-src mon 'profile rbd' osd 'profile rbd' -o /etc/ceph/ceph-src.client.rbd-mirror-src.keyring

On the destination cluster

Create the pools on the destination cluster

ceph-dst:> ceph osd pool create nova 32 32 replicated
ceph-dst:> ceph osd pool create glance 4 4 replicated
ceph-dst:> ceph osd pool create cinder-ceph 32 32 replicated
ceph-dst:> for i in nova glance cinder-ceph; do ceph osd pool application enable $i rbd; done

Enable mirroring on the pools

ceph-dst:> for i in glance nova cinder-ceph; do rbd mirror pool enable $i pool; done

get the credentials from the source cluster

We just need a minimal ceph.conf snippet and the credential:

ceph-dst:> cat /etc/ceph/ceph-src.conf 
[global]
mon host = 10.33.11.251 10.33.21.251 10.33.31.251

ceph-dst:> cat /etc/ceph/ceph-src.client.rbd-mirror-src.keyring 
[client.rbd-mirror-src]
	key = AQAXBetfm/yZFxAA3YmDIgNXEjj1GNuhNXxx6A==

Check whether we can reach to source cluster properly:

ceph-dst:> ceph --cluster ceph-src -n client.rbd-mirror-src osd lspools
1 nova
2 glance
3 cinder-ceph

Create a user/credential for the mirroring

This will be used by the rbd-mirror daemon later:

ceph-dst:> ceph auth get-or-create client.rbd-mirror-dst mon 'profile rbd' osd 'profile rbd' -o /etc/ceph/ceph.client.rbd-mirror-dst.keyring

Install and configure the rbd-mirror daemon

The rbd-mirror daemon is responsible for pulling image updates from the remote peer cluster and applying them to the image within the local cluster.

ceph-dst:> apt install rbd-mirror
ceph-dst:> systemctl enable ceph-rbd-mirror@rbd-mirror-dst.service
ceph-dst:> systemctl start ceph-rbd-mirror@rbd-mirror-dst.service
ceph-dst:> systemctl status ceph-rbd-mirror@rbd-mirror-dst.service

Configure mirroring per pool

So far, we just prepared the mirroring - now it’s time to configure it really:

  • pool: glance
ceph-dst:> rbd mirror pool peer add glance client.rbd-mirror-src@ceph-src
98db5fc6-fc72-4c13-a3d0-c41616a23983
  • pool: nova
ceph-dst:> rbd mirror pool peer add nova client.rbd-mirror-src@ceph-src
15d732fd-f183-4ba8-850e-5303da9056a2
  • pool: cinder-ceph
ceph-dst:> rbd mirror pool peer add cinder-ceph client.rbd-mirror-src@ceph-src
17bf9724-e481-4b7e-bd8a-78982e27ae8b

At this point, we have

  • active mirroring between the source and the destination cluster
  • Openstack APIs down to prevent changes to the cloud
  • Openstack cloud instances/volumes are working; the cloud is still connected to the source cluster

Switch-over

Shut down the instances

This step is necessary because we have to force the recreation of the libvirt configuration for the ephemeral disk(s) since we will use a different set of ceph monitors

note: we just stop the instances - no need to delete / create them; basically, this is why we worked so hard so far!

openstack server stop vm0
openstack server stop vm1
openstack server stop vm2

Detach the volumes from the instances

Again, this step is necessary to force the recreation of the libvirt configuration for the volume(s):

openstack server remove volume vm0 vo0
openstack server remove volume vm1 vo1
openstack server remove volume vm2 vo2

Deactivate images

openstack image set --deactivate cirros

Demote-on-src, promote-on-dst: set the destination cluster as primary

  • We make sure no active mirroring is happening
ceph-dst:> for i in glance nova cinder-ceph; do echo pool $i:; rbd mirror pool status $i; done
pool glance:
health: OK
images: 1 total
    1 replaying
pool nova:
health: OK
images: 3 total
    3 replaying
pool cinder-ceph:
health: OK
images: 3 total
    3 replaying
  • We demote/promote the pool and implicitly each image in that pool in one step
  • We execute all the commands on the destination cluster since we have access to both clusters from there
ceph-dst:> rbd --cluster ceph-src -n client.rbd-mirror-src mirror pool demote glance
Demoted 1 mirrored images

ceph-dst:> rbd mirror pool promote glance
Promoted 1 mirrored images

ceph-dst:> rbd --cluster ceph-src -n client.rbd-mirror-src mirror pool demote nova
Demoted 3 mirrored images

ceph-dst:> rbd mirror pool promote nova
Promoted 3 mirrored images

ceph-dst:> rbd --cluster ceph-src -n client.rbd-mirror-src mirror pool demote cinder-ceph
Demoted 3 mirrored images

ceph-dst:> rbd mirror pool promote cinder-ceph
Promoted 3 mirrored images

Stop mirroring between the two clusters

Remove peers

  • pool: glance
ceph-dst:> rbd mirror pool peer remove glance 98db5fc6-fc72-4c13-a3d0-c41616a23983
  • pool: nova
ceph-dst:> rbd mirror pool peer remove nova 15d732fd-f183-4ba8-850e-5303da9056a2
  • pool: cinder-ceph
ceph-dst:> rbd mirror pool peer remove cinder-ceph 17bf9724-e481-4b7e-bd8a-78982e27ae8b

Disable mirroring per pool

ceph-dst:> for i in glance nova cinder-ceph; do rbd mirror pool disable $i; done

Disable journaling

ceph-dst:> for i in glance nova cinder-ceph; do for j in `rbd -p $i ls`; do rbd feature disable $i/$j journaling; done; done

Stop and disable the rbd-mirror daemon

ceph-dst:> systemctl stop ceph-rbd-mirror@rbd-mirror-dst.service
ceph-dst:> systemctl disable ceph-rbd-mirror@rbd-mirror-dst.service

We disconnect the cloud from the source cluster and connect it to the destination cluster

This step is very exciting, however, it’s outside the scope of this post.

Restore the cloud

openstack image set --activate cirros

openstack server add volume vm0 vo0
openstack server add volume vm1 vo1
openstack server add volume vm2 vo2

openstack server start vm0
openstack server start vm1
openstack server start vm2

Check the xml definition of the instances; this is what we had with the source cluster:

...
      <source protocol='rbd' name='nova/523d0de7-016f-4ef0-ac89-386dd1ERA861_disk'>
        <host name='10.33.11.251' port='6789'/>
        <host name='10.33.21.251' port='6789'/>
        <host name='10.33.31.251' port='6789'/>
...
      <source protocol='rbd' name='cinder-ceph/volume-b4e4e70e-42ba-4479-a15f-cd2db74a755a'>
        <host name='10.33.11.251' port='6789'/>
        <host name='10.33.21.251' port='6789'/>
        <host name='10.33.31.251' port='6789'/>
...

This is what we have now with the destination cluster:

...
      <source protocol='rbd' name='nova/523d0de7-016f-4ef0-ac89-386dd1ERA861_disk'>
        <host name='10.33.10.41' port='6789'/>
        <host name='10.33.10.42' port='6789'/>
        <host name='10.33.10.43' port='6789'/>
...
      <source protocol='rbd' name='cinder-ceph/volume-b4e4e70e-42ba-4479-a15f-cd2db74a755a'>
        <host name='10.33.10.41' port='6789'/>
        <host name='10.33.10.42' port='6789'/>
        <host name='10.33.10.43' port='6789'/>
...
  • The reference to the rbd image is the same for nova and cinder-ceph, respectively
  • The reference to the ceph monitors is different since we replaced the storage

Start the APIs

Now you can enable access to the cloud and allow tenants to change things again.

Closure

We focused on the rbd mirroring step and we had to skip many other steps obviously. You can read a longer version of this proof of concept with working examples here.

References

László Angyal
Cloud Architect