The Story so far

In the previous post, we set up the Openstack L2GW plugin to talk to an OVSDB VTEP switch which provided the function to map VXlans on the Openstack deployment to Vlans on the physical data centre network. We were able to ping from VM1 to the external host VM3, but ping would not work from VM3 to VM2, until VM2 had initiated a ping to VM3. This is to do with the fact that the Openstack controller does not flood unknown Broadcast or Multicast traffic, in this case an ARP request from VM3 looking for the MAC address of VM2. This post will go through how to troubleshoot Neutron and OpenVirtualSwitch networking on Openstack and will provide all the commands needed to understand how it all fits together.

But before we start, here is a diagram giving an overview of what the architecture looks like once the L2GW has been set up and is operational. You'll see the "vlxn74" interface that was created on the CumulusVX switch, as well as the "br-vxln74" bridge interface which bridges "vxln74" and "swp1" together. The RED dashed line is path the tunnelled traffic takes to get from the Openstack Controller to the VXSwtich. The GREEN dashed line is the virtual network extension between the Openstack VXLAN and the Data Centre physical network Vlan 114.

undefined

 

OpenVirtualSwitch

Before we get into understanding why ARP does not work from VM3 to the Openstack VM's 1 and 2, we need to first get to grips with the networking internals of Openstack, in particular, OpenVirtualSwitch (OVS). There are plenty of tutorials out there on how to enable the OpenVirtualSwitch plugin and configure it so let's jump straight into how it works.

OVS sets up the following bridges as required on each Openstack Node.

  • br-ex - Bridges the internal networking of your Openstack Deployment to the external physical interface of a host machine
  • br-int - This is the bridge that the internal Openstack VM's connect to
  • br-tun - This is the bridge that bridges the tunnel overlay network to the internal Openstack network that the VM's are on

Hopefully this diagram makes it a bit clearer. You can see br-ex bridging the internal networking to the physical hosts network interface. The purple dashed line between the br-tun bridges represents the GRE tunnelling between the hosts.

undefined

 

Let's see what flows have been configured on the br-tun bridge. To do this we enter the ovs-ofctl dump-flows br-tun command on the node you want to investigate. In this scenario let's do this on the Controller node.

NOTE: The output has been edited to make it easier to read by separating the flows into their respective tables.

 

ovs-ofctl dump-flows br-tun
table=0, n_packets=228, n_bytes=24067, priority=1,in_port="patch-int" actions=resubmit(,2)
table=0, n_packets=8734, n_bytes=2874398, priority=1,in_port="vxlan-c0a87a05" actions=resubmit(,4)
table=0, n_packets=0, n_bytes=0, priority=1,in_port="vxlan-c0a87a14" actions=resubmit(,4)
table=0, n_packets=0, n_bytes=0, priority=0 actions=drop

table=2, n_packets=161, n_bytes=18805, priority=0,dl_dst=00:00:00:00:00:00/01:00:00:00:00:00 actions=resubmit(,20)
table=2, n_packets=67, n_bytes=5262, priority=0,dl_dst=01:00:00:00:00:00/01:00:00:00:00:00 actions=resubmit(,22)

table=4, n_packets=8734, n_bytes=2874398, priority=1,tun_id=0x4a actions=mod_vlan_vid:1,resubmit(,10)
table=4, n_packets=0, n_bytes=0, priority=0 actions=drop

table=10, n_packets=8734, n_bytes=2874398, priority=1 actions=learn(table=20,hard_timeout=300,priority=1,cookie=0xc1aa7f968895b11c,NXM_OF_VLAN_TCI[0..11],NXM_OF_ETH_DST[]=NXM_OF_ETH_SRC[],load:0->NXM_OF_VLAN_TCI[],load:NXM_NX_TUN_ID[]->NXM_NX_TUN_ID[],output:OXM_OF_IN_PORT[]),output:"patch-int"

table=20, n_packets=0, n_bytes=0, priority=2,dl_vlan=1,dl_dst=fa:16:3e:8b:4f:7c actions=strip_vlan,load:0x4a->NXM_NX_TUN_ID[],output:"vxlan-c0a87a05"
table=20, n_packets=160, n_bytes=18763, priority=2,dl_vlan=1,dl_dst=fa:16:3e:60:e0:24 actions=strip_vlan,load:0x4a->NXM_NX_TUN_ID[],output:"vxlan-c0a87a05"
table=20, n_packets=1, n_bytes=42, hard_timeout=300, priority=1,vlan_tci=0x0001/0x0fff,dl_dst=fa:16:3e:93:07:87 actions=load:0->NXM_OF_VLAN_TCI[],load:0x4a->NXM_NX_TUN_ID[],output:"vxlan-c0a87a05"
table=20, n_packets=0, n_bytes=0, priority=0 actions=resubmit(,22)

table=22, n_packets=21, n_bytes=1874, priority=1,dl_vlan=1 actions=strip_vlan,load:0x4a->NXM_NX_TUN_ID[],output:"vxlan-c0a87a14",output:"vxlan-c0a87a05"
table=22, n_packets=37, n_bytes=2590, priority=0 actions=drop

Starting with Table 0, you can see that packets arriving on ports "vxlan-c0a87a05" and "vxlan-c0a87a14" are submitted to Table 4.

Table 4 matches packets that have a tunnel ID of 0x4a (which is 74 in decimal), modifies the vlan ID to 1 and then submits the packet to Table 10 where the learning of the source and destination MAC addresses happens. The packet is then output port "patch-int".

But what are these "vxlan-c0a87a05" and "vxlan-c0a87a14" interfaces?

ovs-vsctl show 
...
    Bridge br-tun
...
        Port patch-int
            Interface patch-int
                type: patch
                options: {peer=patch-tun}
        Port "vxlan-c0a87a14"
            Interface "vxlan-c0a87a14"
                type: vxlan
                options: {df_default="true", in_key=flow, local_ip="192.168.122.2", out_key=flow, remote_ip="192.168.122.20"}
        Port "vxlan-c0a87a05"
            Interface "vxlan-c0a87a05"
                type: vxlan
                options: {df_default="true", in_key=flow, local_ip="192.168.122.2", out_key=flow, remote_ip="192.168.122.5"}

The output shows us that port vxlan-c0a87a05 is the tunnel between the Openstack Controller and Compute nodes. Port vxlan-c0a87a14 is the tunnel between the Controller node (192.168.122.2) and the VTEP OVSDB switch (192.168.122.20).

So, let's see how this impacts on ARP traffic coming from external VM3 when it's ARP'ing for VM's 1 and 2 on the Openstack Deployment. There is a great tool that allows us to simulate a flow on the OpenVirtualSwitch to see how it will handle a certain flow.

We want to see what the OpenVirtualSwitch will do with an ARP request coming from the VTEP OVSDB switch, arriving on port vxlan-c0a87a14. To do this we use ovs-appctl ofproto/trace and specify a few parameters. In this instance we specify:

  • the bridge (br-tun)
  • the input port the packet will arrive on (in_port="vxlan-c0a87a14")
  • the type of packet (arp)
  • the tunnel ID (tun_id=0x4a)
ovs-appctl ofproto/trace br-tun in_port="vxlan-c0a87a14",arp,tun_id=0x4a
Flow: arp,tun_id=0x4a,in_port=2,vlan_tci=0x0000,dl_src=00:00:00:00:00:00,dl_dst=00:00:00:00:00:00,arp_spa=0.0.0.0,arp_tpa=0.0.0.0,arp_op=0,arp_sha=00:00:00:00:00:00,arp_tha=00:00:00:00:00:00

bridge("br-tun")
----------------
 0. in_port=2, priority 1, cookie 0xc1aa7f968895b11c
    goto_table:4
 4. tun_id=0x4a, priority 1, cookie 0xc1aa7f968895b11c
    push_vlan:0x8100
    set_field:4097->vlan_vid
    goto_table:10
10. priority 1, cookie 0xc1aa7f968895b11c
    learn(table=20,hard_timeout=300,priority=1,cookie=0xc1aa7f968895b11c,NXM_OF_VLAN_TCI[0..11],NXM_OF_ETH_DST[]=NXM_OF_ETH_SRC[],load:0->NXM_OF_VLAN_TCI[],load:NXM_NX_TUN_ID[]->NXM_NX_TUN_ID[],output:OXM_OF_IN_PORT[])
     >> suppressing side effects, so learn action ignored
    output:1

bridge("br-int")
----------------
 0. priority 0, cookie 0x763d21bab4b2b0d6
    goto_table:60
60. priority 3, cookie 0x763d21bab4b2b0d6
    NORMAL
     -> no learned MAC for destination, flooding

bridge("br-ex")
---------------
 0. in_port=1, priority 2, cookie 0x6a615df2d356af00
    drop

As seen in the output the packet would arrive inbound on port 2 at Table 0, it's sent to Table 4 which modifies the Vlan and sends the packet on to Table 10 where the packet is the output to port 1 (which is patch-int, the interface that connects to br-int). In Table 0 on br-int the packet is sent to Table 60 where its flooded out all ports on that bridge, including to br-ex. At br-ex, the packet is dropped.

Let's look at the port numbering on br-tun to help clarify what the previous output has shown us. The command ovs-ofctl show <bridge> really helps to map port numbers to port names.

ovs-ofctl show br-tun
...
 1(patch-int): addr:da:01:2c:08:f1:bf
     config:     0
     state:      0
     speed: 0 Mbps now, 0 Mbps max
 2(vxlan-c0a87a14): addr:ea:d5:63:85:a5:aa    <---- packet arrives on this port
     config:     0
     state:      0
     speed: 0 Mbps now, 0 Mbps max
 3(vxlan-c0a87a05): addr:9e:d9:4c:43:81:c1
     config:     0
     state:      0
     speed: 0 Mbps now, 0 Mbps max
 LOCAL(br-tun): addr:3e:72:52:51:35:40
     config:     PORT_DOWN
     state:      LINK_DOWN
     speed: 0 Mbps now, 0 Mbps max

 I found the command ovs-vsctl show really helps to see how the ports on the 3 internal bridges are connected. Here is a more detailed diagram to help illustrate the bridges.

undefined

 Now that you can visualise the internal switches, let's get back to the ARP request from VM3 from the VTEP OVSDB switch. Firstly, the CumulusVX switch needs to know to send broadcast traffic to the Control Node (or if you have a dedicated Network Node). If you dump the VTEP database and look for "remote MCAST" you'll see there is no information.

sudo ovsdb-client dump --pretty tcp:192.168.122.20:6640
...

Mcast_Macs_Remote table
MAC _uuid ipaddr locator_set logical_switch
--- ----- ------ ----------- --------------

Let's go and insert the information we need to send Broadcast and Multicast traffic to the Control Node of the Openstack Deployment. We do this by using the sudo vtep-ctl add-mcast-remote <UUID of logical switch> unknown-dst <ip address of control node>. You can find the UUID of the logical switch by entering the following command:

sudo vtep-ctl list-ls
71b65a58-4635-470c-ba1d-66776e666602
sudo vtep-ctl add-mcast-remote 71b65a58-4635-470c-ba1d-66776e666602 unknown-dst 192.168.122.2

Now we have the following entry in the database.

sudo ovsdb-client dump --pretty tcp:192.168.122.20:6640
...

Mcast_Macs_Remote table
MAC         _uuid                                ipaddr locator_set                          logical_switch                      
----------- ------------------------------------ ------ ------------------------------------ ------------------------------------
unknown-dst 51b63356-e9a0-4d04-aab0-89f986135930 ""     a5ddb969-7aa3-4aaf-95aa-9440f409e59a 45528526-ea00-4ada-93d7-f8a5d9b48107

The ARP is sent to the Control node and arrives on port 2, it is sent out of port 1 and is flooded on br-int, but it doesn't get sent out of port 3. We'll need to manually set a flow for this to happen. So which Table should we send the packet to for it to be sent out of "vxlan-c0a87a05"? If you look at the output of ovs-ofctl dump-flows br-tun you'll note that Table 22 sends the packet out of "vxlan-c0a87a05" and "vxlan-c0a87a14".

ovs-ofctl dump-flows br-tun
 ...
table=22, n_packets=21, n_bytes=1874, priority=1,dl_vlan=1 actions=strip_vlan,load:0x4a->NXM_NX_TUN_ID[],output:"vxlan-c0a87a14",output:"vxlan-c0a87a05"
table=22, n_packets=37, n_bytes=2590, priority=0 actions=drop

We'll need to add a flow in Table 4 to send the packet on to Table 10 as well as Table 22. We do this by adding the following flow.

ovs-ofctl add-flow br-tun "table=4,tun_id=0x4a,priority=2,actions=mod_vlan_vid:1,resubmit(,10),resubmit(,22)"

 And now verify that you seen a new flow in Table 4

ovs-ofctl dump-flows br-tun
...
table=4, n_packets=12608, n_bytes=4125695, priority=1,tun_id=0x4a actions=mod_vlan_vid:1,resubmit(,10)
table=4, n_packets=0, n_bytes=0, priority=2,tun_id=0x4a actions=mod_vlan_vid:1,resubmit(,10),resubmit(,22)

 If you start a ping from VM3 on the External host, to either VM1 or VM2 on the Openstack Deployment, and you dump the Data Path flows on the Compute node, you will see the ARP request and ARP reply and the ICMP packets.

ovs-dpctl dump-flows 
<ICMP replies>
recirc_id(0),in_port(4),eth(src=fa:16:3e:60:e0:24,dst=52:54:00:06:8c:bd),eth_type(0x0800),ipv4(tos=0/0x3,frag=no), packets:2, bytes:196, used:0.328s, actions:set(tunnel(tun_id=0x4a,src=192.168.122.5,dst=192.168.122.20,ttl=64,tp_dst=4789,flags(df|key))),5

<ARP Reply>
recirc_id(0),tunnel(tun_id=0x4a,src=192.168.122.20,dst=192.168.122.5,flags(-df-csum+key)),in_port(5),eth(src=52:54:00:06:8c:bd,dst=fa:16:3e:60:e0:24),eth_type(0x0806), packets:0, bytes:0, used:never, actions:4

<ICMP Echo>
recirc_id(0),tunnel(tun_id=0x4a,src=192.168.122.20,dst=192.168.122.5,flags(-df-csum+key)),in_port(5),eth(src=52:54:00:06:8c:bd,dst=fa:16:3e:60:e0:24),eth_type(0x0800),ipv4(frag=no), packets:2, bytes:196, used:0.329s, actions:4

<ARP request>
recirc_id(0),in_port(4),eth(src=fa:16:3e:60:e0:24,dst=ff:ff:ff:ff:ff:ff),eth_type(0x0806),arp(sip=192.168.3.10,tip=192.168.3.114,op=1/0xff), packets:0, bytes:0, used:never, actions:set(tunnel(tun_id=0x4a,src=192.168.122.5,dst=192.168.122.2,ttl=64,tp_dst=4789,flags(df|key))),5,set(tunnel(tun_id=0x4a,src=192.168.122.5,dst=192.168.122.20,ttl=64,tp_dst=4789,flags(df|key))),5,push_vlan(vid=1,pcp=0),1,pop_vlan,3

 

Summary

To troubleshoot a physical network infrastructure is one thing. You need to follow the packet through each hope and diagnose each step of the way to resolve connectivity problems. SDN adds a whole new dimension to this paradigm and it is critical that network engineers understand the tools available to them to help diagnose packet flows in a virtual environment. I hope that some of the commands given here will help Network engineers new to SDN some idea as to where to start with how to diagnose packet flows on OpenVirtualSwitch.

List of commands

  • ovs-ofctl dump-flows <bridge>  -  shows you the OpenFlow tables programed on the Virtual Switch
  • ovs-vsctl show  - prints the port layout of each bridge of the Virtual Switch
  • ovs-appctl ofproto/trace <bridge> <parameters> -  shows how OpenFlow will handle a specific flow through each bridge
  • ovs-ofctl show <bridge>  - shows the OpenFlow port numbers on the specified bridge
  • ovsdb-client dump --pretty <connection ID>  -  dumps the contents of the OVS database on the virtual switch
  • ovs-ofctl add-flow <bridge> <parameters>  -  programs a flow into the specifed bridge
  • ovs-dpctl dump-flows  -  dumps the flows the OpenFlow datapath is handling at that moment

 In this post, you followed an ARP packet from the External VM, through the CumulusVX switch, to the Openstack Controller Node, through its br-tun bridge, and finally saw the ARP request and replies on the Openstack Compute Node. You saw how to dump a flow table, dump a switches database, trace a flow and program a flow.

 

References