atmospheric carbon dioxide sensing with an Arduino

I was asked by my Lab Manager to investigate whether one of our CO2 incubators was working properly. Apparently there’s some doubt as to whether it was maintaining CO2 levels. CO2 is an easy gas to detect for the same reason that it causes global warming: it absorbs infrared radiation. You can now buy cheap Non-Dispersive InfraRed CO2 detectors for about US$40 from websites such as AliExpress. However, CO2 incubators typically enrich CO2 to about 5% by volume while most of the cheap sensors only measure up to around 0.5%, or 5,000 parts per million [ppm]. I found a sensor from Sandbox Electronics that works up to 10% by volume, or 100,000ppm. Perfect for testing CO2 incubators.

The MH-Z16 NDIR CO2 Sensor costs $67.95 and comes with a UART to I2C adapter. This makes powering it and communicating with it quite simple: just connect four wires to the development board of your choice! I used an Arduino Nano to read the values from the sensor and send them to the laptop over the USB connection that was also powering it. The sensor can be calibrated in atmospheric CO2 conditions at the push of a button. I have reservations about this one-point calibration but “you pays yer money…”, as my dad used to say.

The Nano is sitting on a screw terminal shield with mount holes for easy connections ($4 well spent for making secure, non-permanent connections to the Arduino). On the laptop screen you can see data coming in from the example sketch I copied from the vendor’s website and using the Arduino library they provided. The sketch sends a reading every second. The serial output shows a starting CO2 reading of 458ppm which climbs slowly to >1000ppm when the photo was taken. This is because I was sitting in front of the sensor taking the photos and it’s registering a local CO2 increase as a result of my breath. If I huffed on it it climbed to >10,000ppm, or 1% by volume. The sensor is very slow to adjust to changes. I had to huff on it for a good couple of minutes until it got that high. Wikipedia tells me that exhaled breath is normally 4-5% CO2 so I guess if I’d kept at it for a while it would have got up there. I might have passed out before then though! 😆 When I moved it away from me it took several more minutes to come back down to ambient levels. Not ideal but fine for my application and budget!

The next task was to make an enclosure to protect the sensor, logic boards and delicate wiring. A couple of evenings spent playing with OpenSCAD yielded an alpha version which I cut on the lasercutter today and tried out. I had to correct a few mistakes in the design but ultimately produced this:

Which became reality as this:

I would have taken a picture of the two boards mounted inside but the lasercut bits fitted together so snugly I didn’t want to try and separate them again in case something broke. You can see the lights of the Arduino inside through the ventilation holes  🙂

I’ve put the sensor in an (allegedly) working CO2 incubator and left it for 20 minutes to stabilise. The sensor read about 40,000ppm, or 4% by volume. As the incubator was meant to be maintaining 5% I’m not too sure what to make of this. I will have to find another incubator to make a replicate measurement. Maybe all of our incubators are a bit off! 😵 However, I’m very pleased with how simple the sensor was to get working and seeing as all you need to run it is a laptop with the Arduino drivers loaded and a serial terminal, such as PuTTY, it is very accessible to even the relatively non-technically minded. 😎


plotting LC-MS data using Python and Plotly

I want to illustrate LC-MS data in 3D to illustrate patterns of isotopes, adducts and molecular structures. I want my students to be able to present these as figure in their dissertations and theses and ultimately manuscripts. I’ve managed this using the Plotly library for Python. Here’s an example from my MSc students’ LC-MS analysis of sterol glucosides in lecithin:

The three ions at the top are acetic adducts of the three common plant sterol glucosides, whereas the three at the bottom are chlorine adducts of the same molecules. You can see they coelute and their m/z delta is 24. This is just an image, I haven’t worked out how to post interactive Plotly figures yet.

Here’s the code:

# This python script strips the time, m/z and intensity values from an LC-MS.txt file
# and plots it as a heatmap using the plotly library

# Copyright (C) 2018 Chris Pook

#This program is free software: you can redistribute it and/or modify
#it under the terms of the GNU General Public License as published by
#the Free Software Foundation, either version 3 of the License, or
#(at your option) any later version.

#This program is distributed in the hope that it will be useful,
#but WITHOUT ANY WARRANTY; without even the implied warranty of
#GNU General Public License for more details.

#You should have received a copy of the GNU General Public License
#along with this program. If not, see <;.


# Start from a full scan (MS2 scan) mass spectrometry file in your native format
# convert this to an MS.txt file using msconvert (select text output)
# You can get msconvert from the Proteowizard package

# numbers <1 flatten the data (less contrast)
# numbers >1 increase contrast
factor = 1

import glob, os
from Tkinter import Tk
from tkFileDialog import askopenfilename
import pandas as pd
import plotly.plotly as py
import plotly.graph_objs as go
import plotly.offline as offline

# get a MS txt file
Tk().withdraw() # we don’t want full GUI, so keep the root window from appearing
f = askopenfilename(filetypes=[(‘MS txt file’, ‘.txt’)],
title=’select one MS txt file’) # show an “Open” dialog box and return the path to the selected file
Folder = os.path.dirname(f) # strip out the folder path
aa = f.split(‘.’)
output = aa[0] + “_STACKED.csv”

# open the file and read it line-by-line
fOpen = open(f, “r”)
lines = fOpen.readlines()

# some variables
# the three arrays will end up holding lists of those values
# each triplet of values codes for one point on the heatmap
a = 0
secs = []
mz = []
signal = []
s = 0
m = 0
sig = 0

# now, let’s hack that data out!
for n in lines:
if ” cvParam: scan start time, ” in n:
# split off the text at the start of the row
b = n.split(‘,’)
#strip out whitespace
c = b[1].strip()
# convert to a float
s = float(c)
#print s
if ” binary: [” in n:
# split off the text at the start of the row
d = n.split(‘]’)
# strip the whitespace around the data
e = d[1].strip()
#turn it into a list of floats
f = [float(i) for i in e.split()]
#print f[0]
if a < 1:
m = f
a = 1
#print m[0]
#print m[1]
#print f
sig = f
#print sig[0]
#print sig[1]
for g in range(0, len(m)-1):
#print g
#print s
#print len(secs)
#print m[0]
#print sig[0]
a = 0

# a little debugging output
print len(secs)
print len(mz)
print len(signal)

# stick all that lovely data into a pandas dataframe
MS = [(‘seconds’,secs),
(‘m/z’, mz),
df = pd.DataFrame.from_items(MS)

# more debugging output
print df.head()

# layout details you can change
# either uncomment autosize or specify your own height & width
layout = go.Layout(
width = 800,
height = 600,
#autosize = True,
#showlegend = False,

# it’s heatmap time!
# plotly colourscales – take your pick!

trace = go.Heatmap(x = df[‘seconds’], y = df[‘m/z’], z = pow(df[‘signal’], factor), colorscale= “Hot”)

fig = go.Figure(data=data, layout=layout)
offline.plot(fig, filename=’my_awesome_LC-MS_heatmap.html’)


DIY thermal conditioning chamber

One of the cool sampling technologies we have here at AUT is Stir Bar Sorptive Extraction [SBSE]. The stir bars are little magnets coated with a layer of PDMS. You pop them into a liquid sample sat on a magnetic stirrer and over a period of a few minutes to an hour or more they will absorb any interesting organic compounds, even ones present at trace amounts, extracting and concentrating them from solution so that they can be analysed by GC-MS. In order to get your compounds off the stir bar and into the GC-MS you have to heat them to 300°C under an inert gas. Our Gerstel multipurpose sampler has a purpose-built doohicky to do this called a Thermal Desorption Unit. The heat vapourises the organic compounds from the PDMS trap and a flow of helium carries the organic compounds onto the GC column.

This is all very cool but before you can use the stir bars you need to thermally condition them to clean them of contaminants. The stir bars pick these contaminants up from anything they come into contact with. Its what they’re designed to do. However, it’s essential to be able to distinguish between contaminants the stir bars have picked up by accident and the interesting compounds I’d like to analyse. The thermal conditioning process releases any contaminants that might confound your analysis before you put the stir bars into your samples so it is an essential precursor to sensitive analysis. The problem is it takes up instrument time on the GC-MS. A lot of instrument time. The manufacturer recommends (pdf) that you condition stir bars for two hours prior to use. Analysing a single stir bar takes about an hour so total instrument time for one analysis is three hours.

I wanted an alternative way of thermally conditioning the stir bars that wouldn’t take up valuable instrument time. This would require some sort of enclosure that can be heated to 300°C and filled with a flow of inert gas. It has to be a flow to remove volatile contaminats as they evaporate. I have several other gas chromatographs in my lab besides the GC-MS which can easily reach 300°C so all I needed was an enclosure I could connect to a gas supply and put in there. I considered designing one using CAD and asking my engineer buddies to machine it using their awesome CNC mills. However, that would take time and effort. Instead I found aluminium enclosures for sale from RS that looked perfect and they only cost a few dollars. I drilled a 5mm hole through the side so that I could screw an HPLC compression screw through it. I then passed a piece of 1/16″ stainless HPLC line with a large internal diameter through the screw, added a ferrule and threaded a steel union onto the end on the inside so that the screw head was tight against the outside wall of the box and the union against the inside. I actually had to add a washer to the outside to get a snug seal. Passing gas through the line would now vent into the box through the hole in the union. The seal around the rim of the lid wasn’t particularly tight but it was good enough to allow me to pressurise the chamber with just a few KPa of nitrogen. You can see a video of gas  bubbling out of it underwater here. Here’s the box sitting inside the GC oven:


GC ovens have ready-made ports in the side specifically so that you can add peripherals so I ran the 1/16″ line out of one of these. I had to make a small cut in the outside insulation to pass the line through.


On the outside the 1/16″ line was connected to a 1/4″ piece of HDPE gas line that ran to the regulator on the gas cylinder in my rack. I didn’t use the expensive, instrument-grade nitrogen that I use to run the GCs for this purpose, of course, but cheap industrial-grade oxygen-free stuff. Even better is that I’m not using the ultra high purity helium that the GC-MS runs on either.


Analytical chemists will notice my stupid mistake immediately. I’ve located the box in a GC with a wax column. Wax columns are very useful but they can’t tolerate temperature above 250°C. Derp! Tomorrow I will move it to another GC so I can hit the required temperature. I’m going to try conditioning some cheap, PDMS-packed-tube traps first before I try the expensive stir bars just to make sure it doesn’t destroy stuff.

Presenting the results of your ANOVA.

When you write up the results of your statistical analysis in your dissertation or thesis or manuscript you can’t just present the p-value and claim you have a result. Instead you include summary statistics describing the data, a written description of what went up and what went down and you back this up by presenting the results of your ANOVA. For your typical one-way ANOVA you need to present what’s called the F-statistic and the Degrees of Freedom of your analysis, as well as the p-value. Here’s an example with two ANOVA results in bold:

The diameter of oocytes varied significantly between individuals from the three populations (F2,22 = 7.54, p < 0.01). Mature females from Froe Creek contained oocytes with a mean (±SD) diameter of 222.7 μm (±8.4). The Teign Estuary and Restronguet Creek females both had lower mean oocyte diameters with means of 206.5 (±14.1 μm) and 197.2 (±9.0), respectively. The Froe Creek mean was 108% of the Teign Estuary mean and 113% of the Restronguet Creek mean.

No significant differences were observed between the variances of the diameters measured in collected oocytes (F2,22 = 1.54, p = 0.239). The mean variance (±SD) of oocytes collected from Froe Creek individuals was 20.2 (±20.8), that of the Teign Estuary individuals was 20.2 (±11.7) and that of Restronguet Creek individuals 31.8 (±15.2).

Note that the letter ‘F‘ is capitalised and italicised, the numbers following it (the Degrees of Freedom) are subscript and separated by a comma, and the ‘p‘ is also italicised. Here’s an example of the output of a one-way ANOVA in R from the previous post on how to do an ANOVA in R. I have highlighted where each value appears.


Finally, it’s important to remember that the p-value is an estimate and so, if you have a significant result (p < 0.05), it is customary not to present the actual p-value given by the ANOVA but to indicate instead that it is less than one of three arbitrary values: p < 0.05, p < 0.01 and p < 0.001.

idiot’s guide to one-way ANOVA in R

R is very powerful but awful to teach students to use. My students almost universally detest it because they have to learn to code. Unfortunately they haven’t much option but to use it to get stats done! (XLSTAT anyone?) So I wanted to throw out a quick post detailing how to carry out an ANOVA in R and interpret the output the easy way. Most importantly I don’t want to get into the mechanics of R or the theory of stats. I’m far from an expert in either but this is the simplest and most common test I need to get students to perform so hopefully this will save them and me a lot of headaches in future.

First you need a data set saved in comma separated value format (.csv) with a header row containing the name of each variable. Open a spreadsheet and copy your data into this format with one row of variable names at the top of the sheet and numbers in the column below. You may only have one header row and you can’t use spaces in the names. If you must use multiple words for variable names separate them with underscores.


In this example each row contains data for one sample. The first four columns characterise the experimental treatments the samples have been subjected too and the mass of sample analysed. You don’t need sample names or numbers, although it doesn’t matter if they are in there. They’re just not necessary for your stats. What the rest of the columns in this data set are doesn’t matter. As it happens they are peak areas from the GC-MS but they could be cell counts or enzyme activity or whatever you’re measuring. You should arrange your data like this in Excel and save the sheet as a .csv file in a convenient folder that you can label something like “stats” or “results”.

Next, open R Studio and create a new script file by pressing Ctrl + Shift + N. Then press Ctrl + S and save it in the same folder as your .csv file with an appropriate name. Then copy this text into it and save it in the same folder as your .csv file:

setwd(“D:\\Chris’s stuff\\scrap”)
data <- read.csv(file=”FINAL RESULTS.csv”,head=TRUE,sep=”,”)
tm <- factor(data$treatment)

# scatterplot
plot(data$Mass ~ tm, xlab=”treatment”, ylab=”mass (mg)”)

# linear model – needed for the ANOVA – we don’t care why!
lm.out = lm(data$Mass ~ tm)

# ANOVA – the value immediately below Pr(>F) is the p-value and the number of stars indicates significance

# post-hoc test to see which treatments were different
pairwise.t.test(data$Mass, tm, p.adj = “none”)

You will need to edit some of the text to reflect your own computer’s file paths. The first line, for example, is just an example and points to the scrap directory on my laptop. You will need to correct this to point to the directory you have created with your .csv file in. To do this press Ctrl + Shift + H, navigate to the folder and click “OK”. This won’t change the script but instead the new relevant piece of code will appear in the console output. Copy and paste this into your script to replace my version. Here’s what it looks like when I do this and select a scrap folder on my other PC:


Note that this isn’t as simple as copying and pasting a folder path in Windows! R requires that the forward slashes in the file path be escaped, or doubled up, so that it doesn’t mistake them as some sort of function. Google it if you want to know more.

Next, you need to replace the name of my .csv file with yours. Make sure you don’t delete the speech marks. You now have two lines of code which will allow you to load your csv file into R. When you run these lines an entry will appear under Data in the Global Environment window. If you double click on it you will see a table of your data open in the Files window.


In this data set the first column contains numbers that are not data but that are a code for the various treatments in this experiment. We don’t want R to treat these numbers as numbers, but as categories. Therefore the third line of code in the script tells R to create something called a factor to use as categories to interpret your data but not to use as numerical values. When you run this line you will see another entry in the Global Environment window confirming that R has created a new factor called “tm” with the numbers as names for each treatment category.

The next line of code allows you to plot your data by category. A useful way of viewing the distribution of your data before you start looking for statistical differences. The two things to plot are specified as “data$Mass ~ tm“. This tells R to take the continuous numerical data in the column titled Mass (N.B. – case sensitive!) from the dataset called “data” and plot it against the factor called “tm“. R will create a boxplot of the numerical data for data$Mass showing the distribution of the data. You should be able to guess, at this stage, whether you are going to get a significant result. I’m pretty sure that treatment 35 is going to be significantly different from treatment 40!


The next few lines of code construct something called a linear model. I’m honestly not sure why R needs this to do the ANOVA. Like I said, I’m not an expert but it won’t work without them so you better go ahead and run them! The output is a table that you don’t need to interpret. Ignore it.

The second to last line is the code to run the actual ANOVA. Running this produces an ANOVA table with the results of your analysis as a p-value under Pr(>F). If p < 0.05 one or more of your treatments is significantly different from the rest. I’ve highlighted my result here.


R is kind enough to annotate your result with stars indicating just how significant it is. My p-value was less than 0.01 so it got two stars. Whoop!

The final piece of code is something called a post-hoc test to determine which treatments are statistically different from each other. This is important as ANOVA doesn’t tell you this. It just tells you that something is different and not what. The pairwise t-test does tell you what and presents a t-statistic for each pair to let you know just how different. Interestingly the first column of results are the same as the output of the linear model. In the pairwise table they’re more easily interpreted and there’s results for all the other pairwise comparisons too, so much more informative.

Importantly, if the ANOVA result is not significant you must accept the null hypothesis, even if the pairwise tests appear to show a statistically significant result. Its not an either/or scenario. ANOVA is more sensitive than t-test and is the determinative component. If the ANOVA is not significant the t-test results are meaningless, whatever their values!

Edit 13-04-2017: The last thing to mention, which really should have been the first, is this: if your data does not comply with the assumptions of ANOVA you cannot use this test anyway. Data for ANOVA must be independent and exhibit homogeneity of variance (heteroscedacity). There is a common misconception that data for ANOVA must be normally distributed too. Tony Underwood has shown that ANOVA is relatively robust to non-normal data sets.

Using Adafruit Fona GSM/GPRS breakout to log data to Xively

Collecting and logging sensor data automatically is one of the most useful “sciencey” things you can do with programmable microprocessors like Arduino and Raspberry Pi. One problem with this is what to do with the data you are collecting. You can store it locally to the device, for example Arduinos can log temperatures or movement signals from PIR sensors to SD cards. However, you have to turn the device off and remove the card, put it into a laptop and copy the data off it before you can access it. Alternatively you can send the data over a serial connection or wirelessly to another device which will store the data for you for easier retrieval or even real-time plotting. Recently small GSM/GPRS modules have become available which are essentially the guts of a mobile phone with no screen, battery or keyboard. These enable you to use mobile phone networks to send data to the internet using protocols such as GPRS and 3G. This is very useful for science as field experimental sites rarely sport ethernet ports or wireless routers through which to transmit data from long-term monitoring sites.

The Adafruit Fona is one of these devices and I obtained one with the intent of monitoring the temperatures in some of our more critical freezers and fridges in case one of them broke down, imperilling the irreplaceable scientific samples stored within. I’ve already set up one device using a GPRS shield from Elecrow. You can see the feed from it here. Its logging temperatures from three freezers in a shed on AUT’s North Campus where various valuable samples are stored. The Fona is a little smaller and a little cheaper and I wanted to make another couple of devices to deploy in our labs to keep an eye on the freezers there. These contain not only valuable samples but, in our molecular labs, thousands of dollars worth of reagents for the MiSeq, our next generation sequencing platform. The idea here is not just to keep an eye on the functionality of the freezers but to get notified if the power goes down. Freezers are well insulated so if the power goes off the temperature won’t immediately rise. If the power goes off on Friday evening, however…

In setting up the Fona I ran into problems getting code to work. The Fona code examples for GPRS access were limited to simple website recovery and not sending HTTP PUT requests, which is what is required to log data to Xively. Frustratingly the code I’d adapted from my previous GPRS logger didn’t work and I could find no indication as to why. Even more frustratingly the Arduino Xively libraries didn’t work either, probably because they had been written for use with WiFi or ethernet and not GPRS. Instructions are sent to the GPRS shields from your microprocessor using a special set of text commands, called AT commands. These commands need to be sent to the shield over a serial connection. The HTTP PUT requests sent to Xively must conform to something called REST, which is explained in somewhat inadequate detail in Xively’s API docs. So, to summarise: I had to use my Arduino to send AT commands to the Fona over serial that would instruct it to send PUT requests conveying my data to Xively. Three different communication protocols. Simples!  >_<

To cut a long story short I managed to use a Python library to send updates from my laptop to Xively by HTTP PUT requests. Using more Python I then managed to log the text of those HTTP requests and use that as a template that I could deconstruct and send to the Fona from my Arduino, inserting real-time data from sensor reads. The result is the code linked at the bottom of this post, which logs temperature, pressure and humidity data from an Adafruit BME280 sensor as well as the battery voltage of the Li-Ion cell connected to the Arduino and Fona. Credit to Limor Fried/Ladyada and Chip McLelland, whose example sketches I adapted to create this.

Here’s a quick pic of my setup. Its a Feather 32u4 LoRa, a uFL Fona and a BME280, all from Adafruit Industries. The Fona must have a lithium battery connected to it so I’ve added an 18650 Li-ion cell which is connected to both the Feather, which charges it, and the Fona. If the power does go out at work then the device will run for several hours on this battery, even without any low-power optimisation of the code. The enclosure is lasercut from 3mm acrylic. Its a beta version as I’ve yet to add mounts and an enclosure for the BME280. Ultimately I want this sketch to read three DS18B20 temperature sensors too, which will be in the freezers, but that’s a work in progress.


Download this code: Adafruit_Feather_Fona_BME280.ino

Lasercut electrophoresis gel tray & combs

Heather and I made some more tweaks to the design from my earlier post, such as adding tabs to the comb supports to make removing them easier. We also added some gaps to the slots for the combs as the lasercutter’s tolerances are incredibly fine, making parts fit really snugly together. So snug that they’ll slot together quite securely without glue. It looks very cool.


The 1.5mm acrylic that I had ordered from RS came in so we took ourselves off to go and cut it. The tray is 6mm clear acrylic and the comb supports are 3mm black acrylic from an offcut we found under the lasercutter. It looks pretty good assembled. (Yes, we reversed every other comb.) We had a couple of issues with the fit between the tabs and cutouts of the tray base and sides but 5 minutes’ work with a file sorted that out so that Heather could superglue it together.


Heather specified the dimensions of the comb teeth so that she could use a 12 channel multipipettor to load amplicons straight from her PCR plates. I’m going to have to ask her to remind me why there’s thirteen teeth…. ? I design things to be fixable so if one of the teeth gets broken you can simply undo the bolts to replace it.


Heather ran some gels on it today and she was very happy, although it turns out the distance between the combs isn’t quite optimal for her samples so in future she’s going to use every other row to give more distance for the bands to separate. Pics to follow tomorrow.

I will post the laser files here when I get a chance too. Or maybe I should upload them to thingiverse??