(Air) Quality of Life

The quality of the air we breathe is something we very often take for granted. We open the windows to get "fresh air" into the house. We go for long walks in the countryside to fill our lungs with clean air. But how clean is the air we are breathing? Tech Companies are starting to display Air Quality Index (AQI) data in their weather and transit maps, Smart home fans and air purifiers measure particles and gasses and can warn you when readings are high.


Roll on Christmas 2020, lockdown number whatever and 3 weeks of spare time, what better time to experiment with building my own Air Quality Meter.


The Idea

 Hook up some sensors to a Raspberry Pi, store the data in a database, graph the data and apply Machine Learning to the data with the help of Splunk's Machine Learning Toolkit (MLTK). The last little bit of the plan was a stretch. I don't have an "always on" machine in my house, but I do have a VM running on Google Cloud so that was the perfect place to install Splunk, but more on that later.


Here's a diagram of the layout of this little project:




My soldering and electrical skills are not the best. It's a miracle I've made it this far in life. I did some searching for a board that had sensors integrated into it and came across the Enviro+ Air Quality sensor board available from Pimoroni. This was perfect for the Pi 3 I had lying around from my Amazon Alexa experimentation. There is a handy GitHub repo for the Enviro+ with installation scripts and some python examples to get you up and running in no time.

One of the examples includes an MQTT message bus which allows the data to be published to a message bus and subscribed to by other applications. This was interesting to me as it's very efficient and the data could be consumed by a graphing engine and an analytics engine.



Knowing that the data could be "streamed", I looked at using the Influxdata tools Telegraf and InfluxDB. Telegraf has many built in data plugins which allow it to collect multiple data types and format the data to be consumed by other applications (such as InfluxDB).


Setting up Telegraf to subscribe to the stream was straight forward. Give it the IP address and port of the server streaming the data, the Topic to subscribe to and what data format the data is in.

  name_override = "sensors"
  name_prefix = "influx"
  servers = ["tcp://"]
  qos = 0
  connection_timeout = "30s"
  topics = [

  data_format = "json"
  json_string_fields = ["serial"]

 Next you configure Telegraf to send the data to the InfluxDB and which database to use.

   urls = [""]
   database = "enviroplus"

 A few seconds later, the data is available in the database.



Graphing the Data

Grafana is a great graphing engine. It's not perfect, but it's fairly straightforward to set up a basic graph. Once you setup the Data Source to point to the InfluxDB database, you select the data source in the graph panel query, select the visualisation that best displays the data and soon you'll have a dashboard giving you all the information you're looking for.




Getting the data to Splunk 

There were a few ways to skin this cat (no cats were hurt in the making of this project). I could have set up Telegraf to stream directly to Splunk, but I decided to install a splunk forwarder on the Pi and let it monitor an output file that I had configured Telegraf to output to while I was debugging.

   files = ["stdout", "/tmp/metrics.out"]

Then it's just a case of telling The Splunk forwarder what file to monitor.

./splunk add monitor /tmp/metrics.out


I'm running Splunk in a Docker Container on my Google Cloud VM. While this is a quick way to spin up a Splunk instance for testing, it's not so great when you have to destroy the container because you will lose all of your settings and data you've ingested. I mapped two local volumes to the container to keep the data persistent between containers should I ever need to rebuild the instance.

I also exposed port 9997 which is the default port Splunk listens on for receiving forwarded data.

docker run -d -p 8000:8000  -p 9997:9997 -v /home/keithchurchill/splunk/etc:/opt/splunk/etc -v /home/keithchurchill/splunk/var:/opt/splunk/var -e "SPLUNK_START_ARGS=--accept-license" -e "SPLUNK_PASSWORD=<nothing to see here :-)>" --name splunk splunk/splunk:latest start 


A few SPL queries later and I have a basic Dashboard up and running. I still need to add a Time Picker and a few other bells and whistles but for now this will do.



Next Steps / To-Do

I mentioned earlier that I want to use The Splunk Machine Learning Toolkit on the data. I want to see if I can predict air quality or see what factors influence the quality of air around. In order to do this, I need data. Lots of Data. The more data the better. 

I'm going to import weather data into Splunk. Temperature, Wind Speed, Wind Direction, Humidity and air pressure. Overlaying this data with the Air Quality data I have, maybe Machine Learning can find some correlation between the data. If it can, it would be very possible to take a weather forecast and turn it into an Air Quality forecast specific to the air and the immediate factors around my house such as Agriculture, Industry and road traffic.