DevStack, IPv6 and Scaleway are in a boat...

Published on October 3, 2024 - Tags : Ops Openstack Ipv6

⚠️ This article is an automated translation. While I personally reviewed the content before publication, some inaccuracies may remain. Read the original French version.

This year, I took over a new Cloud Technologies course for third-year engineers at INP Clermont-Auvergne. Wanting to provide them with fun practical exercises (TPs) where we could really push the concept of Cloud Computing to the maximum, I embarked on the somewhat crazy project of deploying an OpenStack with DevStack, all in native IPv6. A look back at 4 weeks of hair-pulling, 4 weeks of sleepless nights, but for a result that is well worth the detour 😊.

Where do we start?

The first question you ask yourself when attacking such a project is, of course, its feasibility: my last attempt with DevStack dates back to 2020 (and it hadn’t been the greatest success), and I have never set foot in the IPv6 universe. Two huge unknowns, coupled with a third existential question: what infrastructure to host my OpenStack? To answer that, the first question to ask is about infra needs. Since we are talking about practical work over a limited duration, we obviously don’t have the same constraints as a company that would want to operate its infra 24/7 and support zero-downtime updates: in our case, never mind the updates, that will wait until next year.

Taking into account that I will have about 44 students on this infrastructure, divided into groups of at most 17 students, who will launch at most 2 instances simultaneously each, I would need:

17*2 GB of RAM, i.e., 34GB;
17*2 CPUs, i.e., 34CPUs;
17*10 GB of disk, i.e., 170GB of disk;
And 34 IPv6 (well, that’s easy).

Having no desire to spend a good thousand euros to buy a 2U blade that would serve me three months a year, with the additional constraints of power and network supply to manage, I quickly turned to renting dedicated servers. And for that, France is among the great champions, notably with OVHCloud and Scaleway.

Unfortunately, my choice was quickly limited: OVHCloud only offers a single SLAAC IPv6 on its Kimsufi offers. Given my constrained budget, that solution quickly faded. I also took a look at Hetzner, which offers very interesting rates. However, I let myself be convinced by Online/Scaleway, as I am already familiar with their service.

To test is to doubt (but we doubt, so we test)

In order to test in real conditions without breaking the bank, I ordered an entry-level Elastic Metal server, the EM-A115X-SSD. The server offers a 4-core CPU (Xeon E3 1220), 32 GB of RAM, and 2TB of SSD, for €33 per month. Being very afraid that the CPU would suffer greatly with 34 instances turned on, I agreed from the start that this server would only serve as a lab, given that Elastic Metal is paid by the hour: €0.091 per hour in our case.

This server allowed me to experiment with my DevStack deployment with a routable IPv6 block. Since DevStack documentation is very poor, and Scaleway’s documentation on IPv6 isn’t much better, I struggled for more than two weeks to get my IPv6 prefix working (or at least, to get OpenStack to use it). I’ll spare you the details, but I had everything: incoming network but no outgoing, outgoing network but no incoming, no network at all (that was most of the time…), and I almost gave up and abandoned several times.

However, persistence pays off, and thanks to the help of Justine and Louis (thanks ❤️), I finally managed to start a functional DevStack, with functional IPv6 delegation. Two days later, I managed to reproduce the deployment from scratch without any bad surprises: it works, I can finally move on to a “real” infra!

Before moving on, here is the configuration that worked for me, on OpenStack version 2025.1 (dev), deployed from the main branch of Devstack:

sysctl configuration:

net.ipv6.conf.all.forwarding=1
net.ipv6.conf.eno1.autoconf=0
net.ipv6.conf.eno1.accept_ra=2

netplan configuration:

network:
  version: 2
  renderer: networkd
  ethernets:
    eno1:
      critical: true
      dhcp-identifier: mac
      dhcp4: true
      dhcp6: true
      accept-ra: false
      addresses:
        - "IPV6_PUBLIC_BLOC"
      routes:
        - to: "::/0"
          via: "GATEWAY_V6"
          on-link: true
      nameservers:
        addresses:
          - 8.8.8.8
          - 51.159.47.28
          - 51.159.47.26

And the DevStack configuration, the famous local.conf:

[[local|localrc]]
# Change these values
HOST_IP=YOUR_IPV4_HERE
ADMIN_PASSWORD=YOUR_ADMIN_PASSWORD
IPV6_ADDRS_SAFE_TO_USE="IPV6_LOCAL_BLOC_STARTING_WITH_FE80"
IPV6_PRIVATE_NETWORK_GATEWAY="GATEWAY_V6"
IPV6_PUBLIC_RANGE="IPV6_PUBLIC_BLOC"
IPV6_PUBLIC_NETWORK_GATEWAY="IPV6_PUBLIC_GATEWAY (Will be the public bloc without the /56 part)"
HOST_IPV6="HOST_IPV6"

# Don't touch this
SERVICE_HOST=$HOST_IP
MYSQL_HOST=$HOST_IP
RABBIT_HOST=$HOST_IP
GLANCE_HOSTPORT=$HOST_IP:9292
DATABASE_PASSWORD=$ADMIN_PASSWORD
RABBIT_PASSWORD=$ADMIN_PASSWORD
SERVICE_PASSWORD=$ADMIN_PASSWORD
Q_USE_SECGROUP=True
IP_VERSION=6
PUBLIC_INTERFACE=eno1
IPV6_RA_MODE=slaac
IPV6_ADDRESS_MODE=slaac
Q_USE_PROVIDERNET_FOR_PUBLIC=True
OVS_PHYSICAL_BRIDGE=br-ex
PUBLIC_BRIDGE=br-ex
OVS_BRIDGE_MAPPINGS=public:br-ex
TEMPEST_INSTALL=False
INSTALL_TEMPEST=False

Moving to monthly dedicated servers

Now that we have a functional laboratory, it’s time to deploy on a larger scale. Since the Elastic Metal server we have is limited in terms of CPU, we are going to change models and range: indeed, Scaleway’s Elastic range is expensive, and, as said earlier, I don’t have much money available for this project.

Scaleway offers its Dedibox range of dedicated servers paid monthly, at rates much lower than those of its Elastic Metal. Being a regular user of this service which I know is reliable, I decided to switch to it. I set my sights on two START-2-M-SATA dedicated servers, which offer 8 CPUs, 16GB of RAM, and 1TB of HDD each, i.e., an infra of 16 CPUs, 32GB of RAM, and 2TB of HDD. I paid the first month (and the installation fees, always a pleasure), and I started the installation of the two OpenStacks… and then, disaster struck.

IPv6 prefix delegation DUID

Indeed, the Dedibox infrastructure is slightly different from Elastic Metal on the IPv6 part: on Elastic Metal, the configuration of the assignment of your IPv6 blocks is done directly in the graphic interface: you define that such and such a /64 block is attached to your elastic server, and Scaleway takes care of the block assignment on its side. On the Dedibox side, however, Scaleway provides you with blocks, and it’s up to your server (via dhclient for example) to retrieve the management of the block, by authenticating beforehand. Well, here we go again with network tweaking.

Of course, don’t count too much on the documentation: it is, at best, very old, at worst, non-existent. Fortunately, I stumbled upon this post by kgersen on lafibre.info (a HUGE thank you to you!) which mentions the need to tweak dhclient to add a piece of configuration: indeed, Netplan does not support adding a DUID to an IPv6 configuration.

/etc/dhcp/dhclient6.conf

interface "eno1" {
  send dhcp6.client-id FULL_DUID;
}

interface "br-ex" {
  send dhcp6.client-id FULL_DUID;
}

/etc/systemd/system/dhclient6.service

[Service]
Type=forking
Restart=always
RestartSec=2s
TimeoutSec=10s
ExecStart=/sbin/dhclient -1 -v -pf /run/dhclient6.pid -cf /etc/dhcp/dhclient6.conf -lf /var/lib/dhcp/dhclient6.leases -6 -P eno1
ExecStartPost=/sbin/ip -6 addr add V6_PREFIX dev br-ex
ExecStop=/sbin/dhclient -r -v -pf /run/dhclient6.pid -cf /etc/dhcp/dhclient6.conf -lf /var/lib/dhcp/dhclient6.leases -6 eno1
PIDFile=/run/dhclient6.pid

[Install]
WantedBy=multi-user.target network-online.target

Since we want OpenStack to handle IPv6 itself, we ask dhclient6 to record the IPv6 leases for the br-ex interface, which serves as a bridge between OpenStack and the physical network. OpenStack must therefore be deployed before starting our new service.

Now that we have all the right configuration, it’s time to take the plunge with a ./stack.sh. The installation will last about 40 minutes, and we end up with a properly configured OpenStack. We can now start dhclient6, with a small systemctl enable dhclient6 ; systemctl start dhclient6.

Try creating an instance connected to the public network, open ICMP and 22/tcp from ::/0, and if everything went well, you should be able to access your instance! 🎉 If not, it might be worth running a netplan apply followed by a systemctl restart dhclient6: the announcement is sometimes capricious.

Are you sure about the HDDs?

The first TP starts and… everything collapses within ten minutes. Both OpenStacks start lagging severely as soon as the first instance is launched, and the entire infra breaks down. The problem actually came from two different sources:

Physical hard disks: I don’t know exactly how old the disks are at Online, but they must have seen some action. The write access was too low to support such a load (we’re talking about 12 VMs per server, on an infra at less than €20 per month, hard to complain without being in bad faith). My fault on this one, I should have stayed consistent with the test environment which was on SSD. So I decided to cancel my two START-2-M-SATA to buy START-2-M-SSD instead (which are identical in every way, except that the 1TB of HDD becomes 250GB of SSD).
Cinder for /dev/vda: On OpenStack, you have two ways to store a VM’s data: either you create a Cinder block, mounted in /dev/vda, which stores your entire OS, or you don’t, and in that case, you write directly to the disk of the physical machine running the instance, where there is space. Intuitively, Cinder seems like a better solution: not only can you keep your instance data when you destroy and recreate it (what real interest? another debate), but also you assign a specific block to the instance, rather than letting your underlying hypervisor decide for you. However, in our case, Cinder’s “remote” disk is also our local disk, and apparently, Cinder hates provisioning a lot of space on disks where it isn’t the sole master. Many instances were thus failing to be created because it was impossible for Cinder to create a block. The workaround is simple: we no longer create a Cinder volume when starting an instance.

With these small modifications (and two nights spent reinstalling the infra on SSD), we have our two functional OpenStacks, and the students were able to complete their TPs 🎉.

Help, everything was working and suddenly it breaks!

The SLAAC IPv6 announcement is… suffering, let’s say, at Scaleway. And without your SLAAC IPv6, it’s impossible to get DHCPv6 leases (don’t ask me why, I have no idea).

If you suddenly lose your IPv6, or tools like KeyCDN return mixed results (pings passing in some regions but not others), check your configuration: ip -6 a. If you no longer see your SLAAC IPv6, recreate it (the address is available on your Online console):

ip -6 addr add YOUR_V6_SLAAC/64 dev eno1 mngtmpaddr noprefixroute

If that doesn’t solve the problem, you can also try changing the default route:

ip -6 route del default
ip -6 route add default via fe80::a293:51ff:feb7:5b1d dev br-ex metric 1024 pref medium

To conclude

Of course, everything is far from perfect, starting with the deployment, which I haven’t automated yet (I plan to make an Ansible playbook for it). Also to note, the two OpenStacks have a life of their own, each on its side. I saw that Devstack allowed creating “all-in-one” OpenStack clusters, it’s definitely a subject for next year, to simplify usage for students and better distribute the load (currently, I split the students with specific rights per OpenStack).

IPv6 gave me a hard time, and I had to order an extra small machine (at €3 per month) to allow students to have an SSH jump host, since the university network still doesn’t offer proper IPv6 outgoing access 😮‍💨.

Also worth noting is my mistake, which cost me a little over €60 for servers I won’t use. The Elastic Metal tests cost me a little less than €10, and maintaining the infrastructure each month costs me about €35, not counting the €35 installation fee. Given that the TPs end in early December and I plan to cut access on 01/01/2025, the total cost can be estimated at:

€10 for Elastic Metal;
€60 for mistakes;
€35 setup fees;
3*€35 for OpenStack servers;
3*€3 for jump host;
€0.99 for domain name (I have a .ovh for the students to have some fun).

That’s €219.99 for infrastructure for three months of practical work. The sum might seem large, but in reality, given the need and the power obtained for so little, it is actually a very small sum! We have, after all, two Infrastructures as a Service available 24/7 for students to work on their TPs at home, all without worrying about hardware the rest of the time. Without my blunder, the cost would have been even lower, which means that for next year, I could get away with €149.99 (and maybe even less if the university finally provides IPv6… 👀).

All of this on French providers, which is important to me. Proof that when you want to do something, you can do it 😉.