OpenStack: instance cannot get network info from the metadata service

OpenStack: instance cannot get network info from the metadata service

This post assumes some working knowledge of OpenStack networking

When you deploy a new VM, the cloud-init script running inside is used to retrieve useful information (hostname, routing table, etc.) from the Neutron metadata πŸ—„οΈ service.

This service is running inside a dedicated network namespace, which could either on the only the virtual network or both the virtual and external network, depending how you configured it. Basically, if you run ip netns, you'd see two namespaces starting with qdhcp-, where a dnsmasq process is executing as a DHCP server.

Now, the way the VMs communicate in general and in particular with this service is by exploiting some form of network virtualization, usually it's through VXLAN tunnels.

Quoting from Wikipedia:

Virtual Extensible LAN (VXLAN) is a network virtualization technology that attempts to address the scalability problems associated with large cloud computing deployments. It uses a VLAN-like encapsulation technique to encapsulate OSI layer 2 Ethernet frames within layer 4 UDP datagrams, using 4789 as the default IANA-assigned destination UDP port number. VXLAN endpoints, which terminate VXLAN tunnels and may be either virtual or physical switch ports, are known as VXLAN tunnel endpoints (VTEPs).

Therefore, basically, the VM cloud-init sends a DHCP Discover to that dnsmasq process and this message is incapsulated in a VXLAN tunnel to be carried over the network all the way to the controller.

However, in some restrictive systems, as CentOS, a firewall, usually firewalld may be running and if the proper port is not open, the packets may be discarded.

The culprit here is that 4789 is not always the port being used. But actually

The default port for UDP traffic between VXLAN tunnel endpoints varies depending on the system. The Internet Assigned Numbers Authority, or IANA, has assigned UDP port 4789 for the purposes of VXLAN and that is the default port used by Open vSwitch. The Linux kernel, on the other hand, uses UDP port 8472 for VXLAN.

If, as in our case, the Linux bridge agent is being used, the port to be opened is 8472. You can achieve this by running on overy host constituing the OpenStack cluster the following commands:

firewalld --permanent --add-port=8472/UDP
firewalld --reload

Bonus

If your VM can't boot (e.g. you get the error Booting from hard disk... , and in our case it couldn't because OpenStack was running on an old version of vSphere, you could force the machine type.

First of all, run virsh capabilities from the compute node where you wanna deploy your VM. You should get printed something that includes this for the x86_64 architecture and something similar for the other architectures.

    <arch name='x86_64'>
      <wordsize>64</wordsize>
      <emulator>/usr/libexec/qemu-kvm</emulator>
      <machine maxCpus='240'>pc-i440fx-rhel7.6.0</machine>
      <machine canonical='pc-i440fx-rhel7.6.0' maxCpus='240'>pc</machine>
      <machine maxCpus='240'>pc-i440fx-rhel7.0.0</machine>
      <machine maxCpus='384'>pc-q35-rhel7.6.0</machine>
      <machine canonical='pc-q35-rhel7.6.0' maxCpus='384'>q35</machine>
      <machine maxCpus='240'>rhel6.3.0</machine>
      <machine maxCpus='240'>rhel6.4.0</machine>
      <machine maxCpus='240'>rhel6.0.0</machine>
      <machine maxCpus='240'>pc-i440fx-rhel7.5.0</machine>
      <machine maxCpus='240'>pc-i440fx-rhel7.1.0</machine>
      <machine maxCpus='240'>pc-i440fx-rhel7.2.0</machine>
      <machine maxCpus='255'>pc-q35-rhel7.3.0</machine>
      <machine maxCpus='240'>rhel6.5.0</machine>
      <machine maxCpus='384'>pc-q35-rhel7.4.0</machine>
      <machine maxCpus='240'>rhel6.6.0</machine>
      <machine maxCpus='240'>rhel6.1.0</machine>
      <machine maxCpus='240'>rhel6.2.0</machine>
      <machine maxCpus='240'>pc-i440fx-rhel7.3.0</machine>
      <machine maxCpus='240'>pc-i440fx-rhel7.4.0</machine>
      <machine maxCpus='384'>pc-q35-rhel7.5.0</machine>
      <domain type='qemu'/>
      <domain type='kvm'>
        <emulator>/usr/libexec/qemu-kvm</emulator>
      </domain>
    </arch>

See if your machine type is supported, in our case was Centos 7.1804 which matched pc-i440fx-rhel7.5.0.

Then open /etc/nova/nova.conf and place the line hw_machine_type = x86_64=pc-i440fx-rhel7.5.0 in the [libvirt] section.

Restart the nova and the libvirt services by running

systemctl restart libvirtd.service openstack-nova-compute.service

Once you've set the machine type, from the OpenStack controller machine, you should tag your image with the hw_disk_bus and the hw_vif_model. In our case we had to do this:

openstack image set --property hw_disk_bus=ide --property hw_vif_model=e1000 <image-id>

You can get the image id by running

glance image-list

Finally, and hopefully, you should be ready to go 🍾πŸ₯‚

Β