The Lonely Packet

Enviroplus AQI Sensor data streaming to Splunk

(Air) Quality of Life

The quality of the air we breathe is something we very often take for granted. We open the windows to get "fresh air" into the house. We go for long walks in the countryside to fill our lungs with clean air. But how clean is the air we are breathing? Tech Companies are starting to display Air Quality Index (AQI) data in their weather and transit maps, Smart home fans and air purifiers measure particles and gasses and can warn you when readings are high.

 

Roll on Christmas 2020, lockdown number whatever and 3 weeks of spare time, what better time to experiment with building my own Air Quality Meter.

 

The Idea

 Hook up some sensors to a Raspberry Pi, store the data in a database, graph the data and apply Machine Learning to the data with the help of Splunk's Machine Learning Toolkit (MLTK). The last little bit of the plan was a stretch. I don't have an "always on" machine in my house, but I do have a VM running on Google Cloud so that was the perfect place to install Splunk, but more on that later.

 

Here's a diagram of the layout of this little project:

undefined

 

Sensors

My soldering and electrical skills are not the best. It's a miracle I've made it this far in life. I did some searching for a board that had sensors integrated into it and came across the Enviro+ Air Quality sensor board available from Pimoroni. This was perfect for the Pi 3 I had lying around from my Amazon Alexa experimentation. There is a handy GitHub repo for the Enviro+ with installation scripts and some python examples to get you up and running in no time.

One of the examples includes an MQTT message bus which allows the data to be published to a message bus and subscribed to by other applications. This was interesting to me as it's very efficient and the data could be consumed by a graphing engine and an analytics engine.

 

Database

Knowing that the data could be "streamed", I looked at using the Influxdata tools Telegraf and InfluxDB. Telegraf has many built in data plugins which allow it to collect multiple data types and format the data to be consumed by other applications (such as InfluxDB).

 

Setting up Telegraf to subscribe to the stream was straight forward. Give it the IP address and port of the server streaming the data, the Topic to subscribe to and what data format the data is in.

[[inputs.mqtt_consumer]]
  name_override = "sensors"
  name_prefix = "influx"
  servers = ["tcp://192.168.1.112:1883"]
  qos = 0
  connection_timeout = "30s"
  topics = [
    "enviroplus",
  ]

  data_format = "json"
  json_string_fields = ["serial"]

 Next you configure Telegraf to send the data to the InfluxDB and which database to use.

[[outputs.influxdb]]
   urls = ["http://127.0.0.1:8086"]
   database = "enviroplus"

 A few seconds later, the data is available in the database.

undefined

 

Graphing the Data

Grafana is a great graphing engine. It's not perfect, but it's fairly straightforward to set up a basic graph. Once you setup the Data Source to point to the InfluxDB database, you select the data source in the graph panel query, select the visualisation that best displays the data and soon you'll have a dashboard giving you all the information you're looking for.

 

undefined

 

Getting the data to Splunk 

There were a few ways to skin this cat (no cats were hurt in the making of this project). I could have set up Telegraf to stream directly to Splunk, but I decided to install a splunk forwarder on the Pi and let it monitor an output file that I had configured Telegraf to output to while I was debugging.

 [[outputs.file]]
   files = ["stdout", "/tmp/metrics.out"]

Then it's just a case of telling The Splunk forwarder what file to monitor.

./splunk add monitor /tmp/metrics.out

 

I'm running Splunk in a Docker Container on my Google Cloud VM. While this is a quick way to spin up a Splunk instance for testing, it's not so great when you have to destroy the container because you will lose all of your settings and data you've ingested. I mapped two local volumes to the container to keep the data persistent between containers should I ever need to rebuild the instance.

I also exposed port 9997 which is the default port Splunk listens on for receiving forwarded data.

docker run -d -p 8000:8000  -p 9997:9997 -v /home/keithchurchill/splunk/etc:/opt/splunk/etc -v /home/keithchurchill/splunk/var:/opt/splunk/var -e "SPLUNK_START_ARGS=--accept-license" -e "SPLUNK_PASSWORD=<nothing to see here :-)>" --name splunk splunk/splunk:latest start 

 

A few SPL queries later and I have a basic Dashboard up and running. I still need to add a Time Picker and a few other bells and whistles but for now this will do.

undefined

 

Next Steps / To-Do

I mentioned earlier that I want to use The Splunk Machine Learning Toolkit on the data. I want to see if I can predict air quality or see what factors influence the quality of air around. In order to do this, I need data. Lots of Data. The more data the better. 

I'm going to import weather data into Splunk. Temperature, Wind Speed, Wind Direction, Humidity and air pressure. Overlaying this data with the Air Quality data I have, maybe Machine Learning can find some correlation between the data. If it can, it would be very possible to take a weather forecast and turn it into an Air Quality forecast specific to the air and the immediate factors around my house such as Agriculture, Industry and road traffic.

 

Network Automation with Ansible - Getting setup

Gone are the days where Network Engineers manually deploy the same bits of config to the network, day after day, week after week. The role of a Network Engineer has evolved and the scope of the job is becoming move involved. Network Engineers need to work faster and be more consistent than ever before. This is where Automation comes in and can make all the difference. I've only just started my journey and it hasn't been easy. This is because most of the Automation tools are written by Software Engineers. People that understand code! To them, the interdependencies of software library versions and the indenting of code are all second nature. As a Network Engineer, it takes some time to get your head around, or at least it did for me.

So I'm going to explain how to set up an environment so that you can play with Ansible with the focus on Network Automation and I will show you what can go wrong (well, what went wrong for me in the beginning). Hopefully by helping you get over the initial hurdle of becoming familiar with Ansible, the actual network stuff will be easy.

 

Step 1 - Using Vagrant

"What the heck is Vagrant and why do I need it?" I hear you ask. Well, I don't like installing and experimenting with software on my laptop. I don't like having to worry about if this version of Python is going to break some application I'm using. I don't like wasting time, trawling through forums trying to figure out what's broken since I installed some update. I like to use virtual machines to separate my environments and if something goes wrong, I roll it back to a snapshot, or just destroy the VM and start again with no impact to my work applications.

If you do not already have VirtualBox installed, go ahead and install that first at VirtualBox.org

Next, install Vargant. Vagrant is a quick way to get a virtual machine setup without having to create the VM and install an Operating System on it.

Create a directory that you want to store your Vagrant VM configuration in and then "cd" into that directory. Then type the following:

vagrant init ubuntu/xenial64

This will install a basic Ubuntu 16.04 VM onto your machine. To start your VM type:

vagrant up

To login to the VM, type:

vagrant ssh

 

Wow, so easy! You now have a Linux box to install Ansible on. Lets get started.

 

Step 2a - How NOT to install Ansible

You should install Ansible as a particular user on the VM but I'm here to learn Network Automation, not System Administration, so I do things the lazy way and "sudo su" and become the root of the VM and install Ansible via "apt":

apt install ansible

Wow, so easy. Ansible is now installed and all the config files are in the right locations. It should now be a case of configuring the Ansible hosts file in "etc/ansible/hosts", write a simple playbook and off we go, Automation here we come.

WRONG!!

After MANY hours for troubleshooting, running wiresharks and not seeing any SSH traffic leave my machine destined for the host I was trying to connect to, it looks like by installing Ansible via APT, Ansible does not know how to use the Python Paramiko SSH library to make an SSH connection to the host you're trying to connect to.

I'm here for Network Automation, not system debugging! So, lets try this again. Thank goodness we can just destroy the VM and in a few seconds build a fresh VM and start again.

vagrant destroy

vagrant up

 

Step 2b - How to install Ansible

 Another way to install Ansible is via the Python package manger called pip. Again, I just "sudo su" into root so that I don't have to worry about user permissions on files and folders. First install PIP and then we can install Ansible.

apt install python-pip

Now install Ansible. I've had a lot of success with 2.8.0 so I'm going to stick with that. 

pip install ansible==2.8.0

After a few seconds, Ansible will be installed and you're almost good to go. The next bit of software you'll need to install is the Python SSH connection library called Paramiko:

pip install paramiko

 

Step 3 - Setting up Ansible

In your "root" users home directory (the directory you're automatically in when you log into the VM) create a file called ".ansible.cfg" and copy and paste the configuration below which stops Ansible from checking SSH keys. This is important especially in a lab environment were you're going to possibly be connecting to different devices with the same IP address at some point in time.

I use the VIM editor, but you can use whichever text editor you're comfortable with:

vim .ansible.cfg

[defaults]
host_key_checking = False

Next, you need to create an Ansible Hosts file which is located in /etc/ansible/hosts. If the file is not there, create it yourself. This file tells Ansible all about the hosts you'll be automating. What kind of hosts they are, what their ip addresses are, credentials etc... it all goes in here.

There is some very good documentation at docs.ansible.com to follow on how to set up your inventory file. Check it out.

For now, I'm just going to configure two Nokia hosts, SR1 and SR2:

[nokia]
SR1 ansible_host=192.168.1.201
SR2 ansible_host=192.168.1.202

That's that. Let's configure a playbook to do something with these hosts!

 

Step 4 - Playbook Basics

Ansible uses the YAML format to write playbooks. Playbooks are a set of instructions to tell Ansible what to do, to which hosts, when, and what to do if something goes wrong. YAML can be a pig to get used to, but once you get the hang of it, writing playbooks becomes easy.

The beauty of playbooks, and with any automation, is that the more you do, the more code you can re-use and the quicker it becomes to automate tasks. It might take you a week to write you first playbook consisting of many tasks. But the next playbook you need to write, you can re-use a vast amount of the first playbook and only adapt the bits you need.

And once you learn how to make use of variables, you playbooks become infinitely more useable and automating tasks becomes lightning fast.

Let's take a look at a sample Playbook. Create a directory and "cd" into it and then create a file with a .yml extension. This will be the file you call when you run the "ansible-playbook" command. 

---
################################################
# Section to define which hosts the playbook   #
# will run on                                  #
################################################
- hosts: SR1
  gather_facts: no
  connection: local

################################################
# The section below defines variables to use   #
# within the playbook                          #
################################################
  vars:
    cli:
      username: admin
      password: admin
    a_side_port1: 1/1/1
    a_side_port2: 1/1/2
    b_side_port1: 1/1/1
    b_side_port2: 1/1/2

################################################
# The section below defines the tasks to run   #
# against the hosts defined in the first       #
# section                                      #
################################################

  tasks:
  - name: configure description on SR1 port {{a_side_port1}}
    sros_config:
      lines:
        - description "SR1:{{a_side_port1}}-to-SR2:{{b_side_port1}}"
      parents:
        - configure
        - port "{{a_side_port1}}"
      provider: "{{ cli }}"

  - name: configure description on SR1 port {{a_side_port2}}
    sros_config:
      lines:
        - description "SR1:{{a_side_port2}}-to-SR2:{{b_side_port2}}"
      parents:
        - configure
        - port "{{a_side_port2}}"
      provider: "{{ cli }}" 

Indentation is critical in YMAL because if you don't have the exact amount of spaces for a line, it moves the meaning of the line to a different level of code which Ansible won't know how to interpret and you will get an error message.

You should write your playbooks in a decent Text Editor such as Atom or Sublime Text so that you can ensure the indentation is correct. You should be able to copy the text above and put it into a text editor to verify the indentation.

 

 undefined

 

I've defined a few variables in the "vars" section and I pass them to various configuration items in the code via the {{ }}. I've highlighted the variables in the code. Using variables makes the code very flexible. Every time I want to run this playbook, all I do is update the hosts and the variables, simple as that.

What does that playbook do? 

Here is the code that I would type out on the router to configure interface descriptions on interfaces 1/1/1 and 1/1/2:

configure
port 1/1/1
description "SR2:1/1/1-to-SR1:1/1/1"
back
port 1/1/2
description "SR2:1/1/2-to-SR1:1/1/2"

 

In the playbook, I have to use the SROS_CONFIG plugin to send my commands to the router. The "parents" are the commands typed before the final level which you want to enter a command. Example, before I type out the interface description, I have to type, "configure" "port 1/1/1" and then type the command that I want to configure "description SR2:1/1/1-to-SR1:1/1/1".

 

 Step 5 - Run the playbook

Now for the fun bit. At your directory prompt type ansible-playbook and the name of the playbook file you created.

ansible-playbook a-side-interfaces.yml

If your indentation is correct, and your /etc/ansible/hosts file has been set up correctly, you should see something like this:

PLAY [SR1] *************************************************************************************************************************************************

TASK [configure description on SR1 port 1/1/1] ***************************************************************************************************************
changed: [SR1]

TASK [configure description on SR1 port 1/1/2] ***************************************************************************************************************
changed: [SR1]

 

What if it goes wrong?

It's not very often that things work right the first time. If you run a Playbook and it fails, the error message will either indicate a syntax error or some other underlying problem with its interaction with the end device.

 

If it's not a syntax problem that's causing problems, the best thing to do is run your Playbook command in Verbose mode, which will give you additional debugging information.

To run your Playbook in verbose mode use the -vvv flag as follows:

 

ansible-playbook -vvv a-side-interfaces.yml

 

 

Conclusion

Seems like a lot of work to configure two interface description. Yes it is. But when you have to update 800 interface descriptions across a network, or configure a new link in the network to have ISIS and MPLS and you do these types of configurations a few times a week, you soon reap the benefits of changing a few variables and hitting "GO".

 

What's next?

As my playbooks advance I'll be updating this blog. I'll show you how to verify the configuration you're deploying at each step of the configuration and how to stop and rollback in case something hasn't gone according to plan, because lets face it, things don't always go to plan.

 

Troubleshooting OVS/Openflow and L2GW connectivity on Openstack

The Story so far

In the previous post, we set up the Openstack L2GW plugin to talk to an OVSDB VTEP switch which provided the function to map VXlans on the Openstack deployment to Vlans on the physical data centre network. We were able to ping from VM1 to the external host VM3, but ping would not work from VM3 to VM2, until VM2 had initiated a ping to VM3. This is to do with the fact that the Openstack controller does not flood unknown Broadcast or Multicast traffic, in this case an ARP request from VM3 looking for the MAC address of VM2. This post will go through how to troubleshoot Neutron and OpenVirtualSwitch networking on Openstack and will provide all the commands needed to understand how it all fits together.

But before we start, here is a diagram giving an overview of what the architecture looks like once the L2GW has been set up and is operational. You'll see the "vlxn74" interface that was created on the CumulusVX switch, as well as the "br-vxln74" bridge interface which bridges "vxln74" and "swp1" together. The RED dashed line is path the tunnelled traffic takes to get from the Openstack Controller to the VXSwtich. The GREEN dashed line is the virtual network extension between the Openstack VXLAN and the Data Centre physical network Vlan 114.

undefined

 

OpenVirtualSwitch

Before we get into understanding why ARP does not work from VM3 to the Openstack VM's 1 and 2, we need to first get to grips with the networking internals of Openstack, in particular, OpenVirtualSwitch (OVS). There are plenty of tutorials out there on how to enable the OpenVirtualSwitch plugin and configure it so let's jump straight into how it works.

OVS sets up the following bridges as required on each Openstack Node.

  • br-ex - Bridges the internal networking of your Openstack Deployment to the external physical interface of a host machine
  • br-int - This is the bridge that the internal Openstack VM's connect to
  • br-tun - This is the bridge that bridges the tunnel overlay network to the internal Openstack network that the VM's are on

Hopefully this diagram makes it a bit clearer. You can see br-ex bridging the internal networking to the physical hosts network interface. The purple dashed line between the br-tun bridges represents the GRE tunnelling between the hosts.

undefined

 

Let's see what flows have been configured on the br-tun bridge. To do this we enter the ovs-ofctl dump-flows br-tun command on the node you want to investigate. In this scenario let's do this on the Controller node.

NOTE: The output has been edited to make it easier to read by separating the flows into their respective tables.

 

ovs-ofctl dump-flows br-tun
table=0, n_packets=228, n_bytes=24067, priority=1,in_port="patch-int" actions=resubmit(,2)
table=0, n_packets=8734, n_bytes=2874398, priority=1,in_port="vxlan-c0a87a05" actions=resubmit(,4)
table=0, n_packets=0, n_bytes=0, priority=1,in_port="vxlan-c0a87a14" actions=resubmit(,4)
table=0, n_packets=0, n_bytes=0, priority=0 actions=drop

table=2, n_packets=161, n_bytes=18805, priority=0,dl_dst=00:00:00:00:00:00/01:00:00:00:00:00 actions=resubmit(,20)
table=2, n_packets=67, n_bytes=5262, priority=0,dl_dst=01:00:00:00:00:00/01:00:00:00:00:00 actions=resubmit(,22)

table=4, n_packets=8734, n_bytes=2874398, priority=1,tun_id=0x4a actions=mod_vlan_vid:1,resubmit(,10)
table=4, n_packets=0, n_bytes=0, priority=0 actions=drop

table=10, n_packets=8734, n_bytes=2874398, priority=1 actions=learn(table=20,hard_timeout=300,priority=1,cookie=0xc1aa7f968895b11c,NXM_OF_VLAN_TCI[0..11],NXM_OF_ETH_DST[]=NXM_OF_ETH_SRC[],load:0->NXM_OF_VLAN_TCI[],load:NXM_NX_TUN_ID[]->NXM_NX_TUN_ID[],output:OXM_OF_IN_PORT[]),output:"patch-int"

table=20, n_packets=0, n_bytes=0, priority=2,dl_vlan=1,dl_dst=fa:16:3e:8b:4f:7c actions=strip_vlan,load:0x4a->NXM_NX_TUN_ID[],output:"vxlan-c0a87a05"
table=20, n_packets=160, n_bytes=18763, priority=2,dl_vlan=1,dl_dst=fa:16:3e:60:e0:24 actions=strip_vlan,load:0x4a->NXM_NX_TUN_ID[],output:"vxlan-c0a87a05"
table=20, n_packets=1, n_bytes=42, hard_timeout=300, priority=1,vlan_tci=0x0001/0x0fff,dl_dst=fa:16:3e:93:07:87 actions=load:0->NXM_OF_VLAN_TCI[],load:0x4a->NXM_NX_TUN_ID[],output:"vxlan-c0a87a05"
table=20, n_packets=0, n_bytes=0, priority=0 actions=resubmit(,22)

table=22, n_packets=21, n_bytes=1874, priority=1,dl_vlan=1 actions=strip_vlan,load:0x4a->NXM_NX_TUN_ID[],output:"vxlan-c0a87a14",output:"vxlan-c0a87a05"
table=22, n_packets=37, n_bytes=2590, priority=0 actions=drop

Starting with Table 0, you can see that packets arriving on ports "vxlan-c0a87a05" and "vxlan-c0a87a14" are submitted to Table 4.

Table 4 matches packets that have a tunnel ID of 0x4a (which is 74 in decimal), modifies the vlan ID to 1 and then submits the packet to Table 10 where the learning of the source and destination MAC addresses happens. The packet is then output port "patch-int".

But what are these "vxlan-c0a87a05" and "vxlan-c0a87a14" interfaces?

ovs-vsctl show 
...
    Bridge br-tun
...
        Port patch-int
            Interface patch-int
                type: patch
                options: {peer=patch-tun}
        Port "vxlan-c0a87a14"
            Interface "vxlan-c0a87a14"
                type: vxlan
                options: {df_default="true", in_key=flow, local_ip="192.168.122.2", out_key=flow, remote_ip="192.168.122.20"}
        Port "vxlan-c0a87a05"
            Interface "vxlan-c0a87a05"
                type: vxlan
                options: {df_default="true", in_key=flow, local_ip="192.168.122.2", out_key=flow, remote_ip="192.168.122.5"}

The output shows us that port vxlan-c0a87a05 is the tunnel between the Openstack Controller and Compute nodes. Port vxlan-c0a87a14 is the tunnel between the Controller node (192.168.122.2) and the VTEP OVSDB switch (192.168.122.20).

So, let's see how this impacts on ARP traffic coming from external VM3 when it's ARP'ing for VM's 1 and 2 on the Openstack Deployment. There is a great tool that allows us to simulate a flow on the OpenVirtualSwitch to see how it will handle a certain flow.

We want to see what the OpenVirtualSwitch will do with an ARP request coming from the VTEP OVSDB switch, arriving on port vxlan-c0a87a14. To do this we use ovs-appctl ofproto/trace and specify a few parameters. In this instance we specify:

  • the bridge (br-tun)
  • the input port the packet will arrive on (in_port="vxlan-c0a87a14")
  • the type of packet (arp)
  • the tunnel ID (tun_id=0x4a)
ovs-appctl ofproto/trace br-tun in_port="vxlan-c0a87a14",arp,tun_id=0x4a
Flow: arp,tun_id=0x4a,in_port=2,vlan_tci=0x0000,dl_src=00:00:00:00:00:00,dl_dst=00:00:00:00:00:00,arp_spa=0.0.0.0,arp_tpa=0.0.0.0,arp_op=0,arp_sha=00:00:00:00:00:00,arp_tha=00:00:00:00:00:00

bridge("br-tun")
----------------
 0. in_port=2, priority 1, cookie 0xc1aa7f968895b11c
    goto_table:4
 4. tun_id=0x4a, priority 1, cookie 0xc1aa7f968895b11c
    push_vlan:0x8100
    set_field:4097->vlan_vid
    goto_table:10
10. priority 1, cookie 0xc1aa7f968895b11c
    learn(table=20,hard_timeout=300,priority=1,cookie=0xc1aa7f968895b11c,NXM_OF_VLAN_TCI[0..11],NXM_OF_ETH_DST[]=NXM_OF_ETH_SRC[],load:0->NXM_OF_VLAN_TCI[],load:NXM_NX_TUN_ID[]->NXM_NX_TUN_ID[],output:OXM_OF_IN_PORT[])
     >> suppressing side effects, so learn action ignored
    output:1

bridge("br-int")
----------------
 0. priority 0, cookie 0x763d21bab4b2b0d6
    goto_table:60
60. priority 3, cookie 0x763d21bab4b2b0d6
    NORMAL
     -> no learned MAC for destination, flooding

bridge("br-ex")
---------------
 0. in_port=1, priority 2, cookie 0x6a615df2d356af00
    drop

As seen in the output the packet would arrive inbound on port 2 at Table 0, it's sent to Table 4 which modifies the Vlan and sends the packet on to Table 10 where the packet is the output to port 1 (which is patch-int, the interface that connects to br-int). In Table 0 on br-int the packet is sent to Table 60 where its flooded out all ports on that bridge, including to br-ex. At br-ex, the packet is dropped.

Let's look at the port numbering on br-tun to help clarify what the previous output has shown us. The command ovs-ofctl show <bridge> really helps to map port numbers to port names.

ovs-ofctl show br-tun
...
 1(patch-int): addr:da:01:2c:08:f1:bf
     config:     0
     state:      0
     speed: 0 Mbps now, 0 Mbps max
 2(vxlan-c0a87a14): addr:ea:d5:63:85:a5:aa    <---- packet arrives on this port
     config:     0
     state:      0
     speed: 0 Mbps now, 0 Mbps max
 3(vxlan-c0a87a05): addr:9e:d9:4c:43:81:c1
     config:     0
     state:      0
     speed: 0 Mbps now, 0 Mbps max
 LOCAL(br-tun): addr:3e:72:52:51:35:40
     config:     PORT_DOWN
     state:      LINK_DOWN
     speed: 0 Mbps now, 0 Mbps max

 I found the command ovs-vsctl show really helps to see how the ports on the 3 internal bridges are connected. Here is a more detailed diagram to help illustrate the bridges.

undefined

 Now that you can visualise the internal switches, let's get back to the ARP request from VM3 from the VTEP OVSDB switch. Firstly, the CumulusVX switch needs to know to send broadcast traffic to the Control Node (or if you have a dedicated Network Node). If you dump the VTEP database and look for "remote MCAST" you'll see there is no information.

sudo ovsdb-client dump --pretty tcp:192.168.122.20:6640
...

Mcast_Macs_Remote table
MAC _uuid ipaddr locator_set logical_switch
--- ----- ------ ----------- --------------

Let's go and insert the information we need to send Broadcast and Multicast traffic to the Control Node of the Openstack Deployment. We do this by using the sudo vtep-ctl add-mcast-remote <UUID of logical switch> unknown-dst <ip address of control node>. You can find the UUID of the logical switch by entering the following command:

sudo vtep-ctl list-ls
71b65a58-4635-470c-ba1d-66776e666602
sudo vtep-ctl add-mcast-remote 71b65a58-4635-470c-ba1d-66776e666602 unknown-dst 192.168.122.2

Now we have the following entry in the database.

sudo ovsdb-client dump --pretty tcp:192.168.122.20:6640
...

Mcast_Macs_Remote table
MAC         _uuid                                ipaddr locator_set                          logical_switch                      
----------- ------------------------------------ ------ ------------------------------------ ------------------------------------
unknown-dst 51b63356-e9a0-4d04-aab0-89f986135930 ""     a5ddb969-7aa3-4aaf-95aa-9440f409e59a 45528526-ea00-4ada-93d7-f8a5d9b48107

The ARP is sent to the Control node and arrives on port 2, it is sent out of port 1 and is flooded on br-int, but it doesn't get sent out of port 3. We'll need to manually set a flow for this to happen. So which Table should we send the packet to for it to be sent out of "vxlan-c0a87a05"? If you look at the output of ovs-ofctl dump-flows br-tun you'll note that Table 22 sends the packet out of "vxlan-c0a87a05" and "vxlan-c0a87a14".

ovs-ofctl dump-flows br-tun
 ...
table=22, n_packets=21, n_bytes=1874, priority=1,dl_vlan=1 actions=strip_vlan,load:0x4a->NXM_NX_TUN_ID[],output:"vxlan-c0a87a14",output:"vxlan-c0a87a05"
table=22, n_packets=37, n_bytes=2590, priority=0 actions=drop

We'll need to add a flow in Table 4 to send the packet on to Table 10 as well as Table 22. We do this by adding the following flow.

ovs-ofctl add-flow br-tun "table=4,tun_id=0x4a,priority=2,actions=mod_vlan_vid:1,resubmit(,10),resubmit(,22)"

 And now verify that you seen a new flow in Table 4

ovs-ofctl dump-flows br-tun
...
table=4, n_packets=12608, n_bytes=4125695, priority=1,tun_id=0x4a actions=mod_vlan_vid:1,resubmit(,10)
table=4, n_packets=0, n_bytes=0, priority=2,tun_id=0x4a actions=mod_vlan_vid:1,resubmit(,10),resubmit(,22)

 If you start a ping from VM3 on the External host, to either VM1 or VM2 on the Openstack Deployment, and you dump the Data Path flows on the Compute node, you will see the ARP request and ARP reply and the ICMP packets.

ovs-dpctl dump-flows 
<ICMP replies>
recirc_id(0),in_port(4),eth(src=fa:16:3e:60:e0:24,dst=52:54:00:06:8c:bd),eth_type(0x0800),ipv4(tos=0/0x3,frag=no), packets:2, bytes:196, used:0.328s, actions:set(tunnel(tun_id=0x4a,src=192.168.122.5,dst=192.168.122.20,ttl=64,tp_dst=4789,flags(df|key))),5

<ARP Reply>
recirc_id(0),tunnel(tun_id=0x4a,src=192.168.122.20,dst=192.168.122.5,flags(-df-csum+key)),in_port(5),eth(src=52:54:00:06:8c:bd,dst=fa:16:3e:60:e0:24),eth_type(0x0806), packets:0, bytes:0, used:never, actions:4

<ICMP Echo>
recirc_id(0),tunnel(tun_id=0x4a,src=192.168.122.20,dst=192.168.122.5,flags(-df-csum+key)),in_port(5),eth(src=52:54:00:06:8c:bd,dst=fa:16:3e:60:e0:24),eth_type(0x0800),ipv4(frag=no), packets:2, bytes:196, used:0.329s, actions:4

<ARP request>
recirc_id(0),in_port(4),eth(src=fa:16:3e:60:e0:24,dst=ff:ff:ff:ff:ff:ff),eth_type(0x0806),arp(sip=192.168.3.10,tip=192.168.3.114,op=1/0xff), packets:0, bytes:0, used:never, actions:set(tunnel(tun_id=0x4a,src=192.168.122.5,dst=192.168.122.2,ttl=64,tp_dst=4789,flags(df|key))),5,set(tunnel(tun_id=0x4a,src=192.168.122.5,dst=192.168.122.20,ttl=64,tp_dst=4789,flags(df|key))),5,push_vlan(vid=1,pcp=0),1,pop_vlan,3

 

Summary

To troubleshoot a physical network infrastructure is one thing. You need to follow the packet through each hope and diagnose each step of the way to resolve connectivity problems. SDN adds a whole new dimension to this paradigm and it is critical that network engineers understand the tools available to them to help diagnose packet flows in a virtual environment. I hope that some of the commands given here will help Network engineers new to SDN some idea as to where to start with how to diagnose packet flows on OpenVirtualSwitch.

List of commands

  • ovs-ofctl dump-flows <bridge>  -  shows you the OpenFlow tables programed on the Virtual Switch
  • ovs-vsctl show  - prints the port layout of each bridge of the Virtual Switch
  • ovs-appctl ofproto/trace <bridge> <parameters> -  shows how OpenFlow will handle a specific flow through each bridge
  • ovs-ofctl show <bridge>  - shows the OpenFlow port numbers on the specified bridge
  • ovsdb-client dump --pretty <connection ID>  -  dumps the contents of the OVS database on the virtual switch
  • ovs-ofctl add-flow <bridge> <parameters>  -  programs a flow into the specifed bridge
  • ovs-dpctl dump-flows  -  dumps the flows the OpenFlow datapath is handling at that moment

 In this post, you followed an ARP packet from the External VM, through the CumulusVX switch, to the Openstack Controller Node, through its br-tun bridge, and finally saw the ARP request and replies on the Openstack Compute Node. You saw how to dump a flow table, dump a switches database, trace a flow and program a flow.

 

References

 

 

Openstack (Pike) L2GW setup on Ubuntu 16.04

 

What is L2GW

Layer 2 Gateway (L2GW) is an Openstack plugin which allows you to extend communication from virtual machines on your Openstack deployment, to physical or virtual machines on a network outside of the Openstack deployment. More specifically, it allows you to map VXLAN's on your Openstack deployment, to VLANS on your data centre network. This mapping function is performed by a VXLAN Tunnel End Point (VTEP) server.

 

Before we begin...

 The assumption at this point is that you already have a basic operational Openstack Deployment and at least have the following services installed, configured and working:

  • Keystone - Identity Service
  • Glance - Image Service
  • Nova - Compute Service
  • Neutron - Networking Service
  • Horizon - Dashboard

If you don't have these working yet, follow the respective installation guides found here for your operating system of choice.

 

Architecture

This deployment is made up of a few key components. An Openstack Controller, Compute node hosting two VM's on a VXLAN (74), a Cumulus VX virtual switch to act as the VTEP and a KVM virtual machine hosted outside of the Openstack environment, directly connected to the VXSwitch on a Vlan (114). All of these are hosted on a single physical server running Ubuntu 16.04.

 

undefined

 

VXSwitch installation

Sign up at Cumulus Networks for a free trial of their Cumulus VX virtual switch, then download the image that applies to the virtualisation technology you're using.

Spin up a VM with the Cumulus VX image as the disk. Put one interface into the bridge that the Openstack nodes are in, in this case virbr0 and the other interface into the bridge that the External Hosts will connect to, virbr2 in the diagram.

virt-install --connect qemu:///system -n vxswitch --vcpus=1 -r 512 --network=bridge:virbr0,model=virtio --network=bridge:virbr2,model=virtio \
 -f /var/lib/libvirt/images/vxswitch.qcow2 \
--vnc --noautoconsole --boot hd

Open the switches console and login with the Username cumulus and Password CumulusLinux!

Add a management ip address to eth0

net add interface eth0 ip address 192.168.122.20/24
net pending
net commit

Reboot the switch and you should be able to connect to it from your management network.

Edit the following file to force the VTEP to start with the switch boots. Modify "START=no" to "START=yes" and save the file

/etc/default/openvswitch-vtep

Now we start the openvswitch-vtep service and then bootstrap the database with some information.

sudo service openvswitch-vtep start 
sudo vtep-bootstrap VXSWITCH 192.168.122.20 192.168.122.20 --no_encryption

The vtep-boostrap command does the following:

  • Creates the VTEP OVSDB database schema. This is where the VXLAN to VLAN mappings are stored.
  • within that schema a physical switch called "VXSWITCH" is created
  • sets the vtep ip address to 192.168.122.20
  • starts listening to incoming OVSDB connections on 192.168.122.20

At this point your VXSwitch is set up and we can proceed with the installation of the L2GW Plugin on the Openstack environment.

 

Openstack L2GW Plugin installation

If you've been working with Openstack, you'll know that sometimes it's hard to find documentation specific to your deployment and when you do find it, it doesn't always work as expected. Or you'll encounter a problem and see that other people have encountered the same problem, but there are no solutions. I encountered a lot of this with my L2GW installation, so I will share some of the common problems people have encountered and what I did to resolve them. I'm hoping the sharing of my journey will help someone else expedite their installation. No point in the greater community coming across the same problems without solving them together and sharing the answers.

First up, install the neutron-l2gateway-agent and python-networking-l2gw packages from the Xenial repositories.

apt install neutron-l2gateway-agent python-networking-l2gw

 Next up, edit your neutron.conf file to add the service plugin "networking_l2gw.services.l2gateway.plugin.L2GatewayPlugin"

vi /etc/neutron/neutron.conf 
...
service_plugins = router,networking_l2gw.services.l2gateway.plugin.L2GatewayPlugin

 Then you need to edit your l2gateway_agent.ini file to add the ip address of your OVSDB. This is the ip address you assigned to the VXSwitch in the previous section.

vi /etc/neutron/l2gateway_agent.ini
...
ovsdb_hosts = 'ovsdb1:192.168.122.20:6640'
# Example: ovsdb_hosts = 'ovsdb1:16.95.16.1:6632,ovsdb2:16.95.16.2:6632'

WARNING! You'll notice the the example provided in the config file, they use port 6632 to connect to the VTEP OVSDB. But if you do a "netstat-a" on the VXSwitch, you'll notice that the switch is listening on port 6640. This tripped me up in the beginning and I wasted a fair amount of time chasing "connection refused messages" all over the Openstack deployment.

 Now we need to stop the Neutron service, update the database with the information in the l2gw_plugin.ini file and then start Neutron again.

service neutron-server stop
neutron-db-manage --config-file /etc/neutron/neutron.conf \
--config-file /etc/neutron/l2gw_plugin.ini  upgrade head
service neutron-server start

The next step in the installation gave me a lot of problems. Most installation instructions say that you need to configure the neutron-server.service file to load the /etc/neutron/l2gw_plugin.ini config file on start. This is supposed to feed Neutron the Service Provider information it needs to load the plugin. However, this doesn't work and you get the following errors in your /etc/var/neutron/neutron-server.log file

INFO neutron.manager	Loading Plugin: networking_l2gw.services.l2gateway.plugin.L2GatewayPlugin
ERROR neutron.services.service_base	No providers specified for 'L2GW' service, exiting

To resolve this you need to explicitly define the service providers in your neutron.conf file.

[service_providers]
service_provider=L2GW:l2gw:networking_l2gw.services.l2gateway.service_drivers.rpc_l2gw.L2gwRpcDriver:default

Once you've done this, you'll see the following in your neutron-server.log

...
INFO neutron.api.extensions Loaded extension: l2-gateway
INFO neutron.api.extensions Loaded extension: l2-gateway-connection
...

 Next, you can start restart the Neutron-Server service and start the Neutron-L2gateway-Agent service

systemctl daemon-reload
service neutron-server restart
service neutron-l2gateway-agent start

At this point, source the saved admin credentials and check to see that the Neutron-L2gateway-Agent is running.

openstack network agent list -c "Agent Type" -c Host -c Alive -c State
+--------------------+------------+-------+-------+
| Agent Type         | Host       | Alive | State |
+--------------------+------------+-------+-------+
| Metadata agent     | controller | :-)   | UP    |
| Open vSwitch agent | compute    | :-)   | UP    |
| L3 agent           | controller | :-)   | UP    |
| L2 Gateway agent   | controller | :-)   | UP    |
| DHCP agent         | controller | :-)   | UP    |
| Open vSwitch agent | controller | :-)   | UP    |
+--------------------+------------+-------+-------+

Now it's just a case of creating a L2 Gateway device and creating the connection between the Openstack tenant network and the external vlan network.

To create the L2 Gateway device, enter into the neutron configuration mode and enter the following command:

neutron

l2-gateway-create --device name="vxswitch",interface_names="swp1" CUMULUS-L2GW

It is important to note what interface on the VXSwitch connects to the host you're trying to connect to. In this instance its switchport "swp1".

 Lastly let's set up the connection between the tenant network and the VLAN network. To do this, you'll need to know the name of the Openstack tenant network that you want to bridge to the outside. In this case we want to bridge the "selfservice3" network.

openstack network list -c Name -c Subnets
+--------------+--------------------------------------+
| Name         | Subnets                              |
+--------------+--------------------------------------+
| external-net | 7942d261-c15b-4aa6-b5e7-e9895f15d070 |
| selfservice3 | a7d5a449-a097-40d5-a85d-011afdb35a83 |
| selfservice2 | a32c53ae-f219-478f-b3ac-73e174228600 |
| selfservice1 | 34bdfd01-c9cf-4096-9f78-23bb6b4fb3a6 |
+--------------+--------------------------------------+

openstack network show selfservice3
+---------------------------+--------------------------------------+
| Field                     | Value                                |
+---------------------------+--------------------------------------+
| admin_state_up            | UP                                   |
| availability_zone_hints   |                                      |
| availability_zones        | nova                                 |
| created_at                | 2018-01-15T17:00:06Z                 |
| description               |                                      |
| dns_domain                | None                                 |
| id                        | 71b65a58-4635-470c-ba1d-66776e666602 |
| ipv4_address_scope        | None                                 |
| ipv6_address_scope        | None                                 |
| is_default                | None                                 |
| is_vlan_transparent       | None                                 |
| mtu                       | 1450                                 |
| name                      | selfservice3                         |
| port_security_enabled     | True                                 |
| project_id                | f2fd567fb52c4da5ac8b806a2fb8b89d     |
| provider:network_type     | vxlan                                |
| provider:physical_network | None                                 |
| provider:segmentation_id  | 74                                   |
| qos_policy_id             | None                                 |
| revision_number           | 3                                    |
| router:external           | Internal                             |
| segments                  | None                                 |
| shared                    | False                                |
| status                    | ACTIVE                               |
| subnets                   | a7d5a449-a097-40d5-a85d-011afdb35a83 |
| tags                      |                                      |
| updated_at                | 2018-01-15T17:00:07Z                 |
+---------------------------+--------------------------------------+

We also need to know what the Vlan number is on the data centre network that the host is connected to. In this case we're using Vlan 114.

neutron
l2-gateway-connection-create --default-segmentation-id 114 CUMULUS-L2GW selfservice3

Created a new l2_gateway_connection:
+-----------------+--------------------------------------+
| Field           | Value                                |
+-----------------+--------------------------------------+
| id              | 2268f9d9-8617-4a02-9eea-47fa5aa7c482 |
| l2_gateway_id   | 1adf7124-da1b-4265-96bb-1fbfe371e322 |
| network_id      | 71b65a58-4635-470c-ba1d-66776e666602 |
| segmentation_id | 114                                  |
| tenant_id       | 5326fb7f7a844bff931ae9205b20f799     |
+-----------------+--------------------------------------+

 

If you go back to your VXswitch you will see that a new virtual port has been created called vxln74 along with a bridge called br-vxln74.

brctl show
bridge name	bridge id		STP enabled	interfaces
br-vxln74		8000.829608997b08	no		vxln74

But, there is something missing from the bridge. We need the swp1 to be included in the bridge. But looking a little closer, the port is in a DOWN state.

ip link show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
    link/ether 52:54:00:57:69:0f brd ff:ff:ff:ff:ff:ff
3: swp1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 52:54:00:0d:3f:90 brd ff:ff:ff:ff:ff:ff
4: br-vxln74: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default 
    link/ether 82:96:08:99:7b:08 brd ff:ff:ff:ff:ff:ff
5: vxln74: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br-vxln74 state UNKNOWN mode DEFAULT group default 
    link/ether 82:96:08:99:7b:08 brd ff:ff:ff:ff:ff:ff
6: swp1.114@swp1: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default 
    link/ether 52:54:00:0d:3f:90 brd ff:ff:ff:ff:ff:ff

You'll also notice that a new subinterface has been created for us "swp1.114". This is the vlan interface on swp1.

Let's bring up swp1

sudo ifconfig swp1 up

And now we have a switchport in br-vxln74

brctl show
bridge name	bridge id		STP enabled	interfaces
br-vxln74		8000.5254000d3f90	no		swp1.114
							vxln74

 For completeness sake here is the /etc/network/interfaces config for the external VM host

auto ens6
iface ens6 inet static

auto ens6.114
iface ens6.114 inet static
        address 192.168.3.114
        netmask 255.255.255.0

 

Testing

So, you're really excited, you hop onto your Openstack VM's and you can ping the external VM from VM1. Then you ping from the external VM to the Openstack VM1. Life has never been better. And then you decide to ping Openstack VM2 from the external VM and it doesn't work. You open the console of Openstack VM2 and you ping the external VM AND IT WORKS!?!?!

Something isn't quite right. But we'll get into that in the next post.

 

References

There are some really great articles out there explaining how to build an Openstack deployment and the various other components I've used in this article. Here are the links to sources of a lot of my information.

 

Home