As a follow-up to this earlier post I think its important to note that I have discovered how to search the entire NIST library from AMDIS! It is odd but this doesn’t seem to be well described in the manual so maybe I can help other GC-MS ninjas to get the most out of their data by describing it here.

The first thing you have to do after loading a results file, is to analyse it. You don’t need to use any particular library to do this, you just need AMDIS to identify all the component scans in the sample. Any identities assigned to targets in this step will be overwritten in the following one.

Next you have to select the “Search NIST Library…” option from the “Analyze” [sic] menu.


This will give you a bunch of familiar options. If you haven’t set any reasonably stringent conditions in your earlier analysis step then you might want to specify a threshold to the intensity of the spectra it searches in the highlighted box in the image below. This will save you quite a lot of time. Just make sure you’ve got the mainlib selected in the drop-down box at the bottom left.


I’ve tried this before and it just gave me an error message so I assumed it was a bug. Analytical software tends to be ridden with bugs and bits of it often just don’t work. As long as you try it on a file that’s already been analysed it will work though.

The output is exactly the same as that from any other AMDIS analysis. The only difference that I can see is that there is no way to automate this process,for batches of results files, unlike when using a custom library. At this stage I am prepared to accept this trade off for the extra hits you get from being able to exploit the ENTIRE NIST LIBRARY.


You can see here that I’ve had identities returned for 57 targets, using the filters mentioned above. This particular sample is giant squid ink that has been derivatised with MCF*. When I analyse it using the custom library for MCF derivatives I get 114 targets identified. This is pretty good for MCF derv, but when I use the full NIST library to search for matches for all 2592 components I get 192 target matches!

I haven’t done any robust investigations of the accuracy of the full NIST results but they seem superficially relevant. I’m confident the results will at least compliment the MCF-specific results. Watch this space for more detail.

*How awesome is that?

AMDIS and metabolite profiling by GC-MS

AMDIS, the Automated Mass spectral Deconvolution and Identification System, is a piece of freeware produced by NIST in the USA. In combination with an appropriate library of reference spectra the software identifies features in your GC-MS data and compares them to library spectra in order to assigns identities to them. This is fantastically useful for metabolite profiling, or metabolomics, as the data typically contain hundreds of features, each feature being a unique molecular structure present within your samples. It is possible to achieve similar results using other GC-MS software and associated libraries but none of them seem to be able to automate the process in the way that AMDIS does. That’s because AMDIS also has a batch process function in which you can select a whole set of data files and the software will churn through them, producing a single summary file of data. I have a suspicion that MassHunter might have a similar capability but I have never used it so I cannot comment further. We have a Thermo Finnigan GC-MS so I am using their Xcalibur software to acquire the data. The Xcalibur browser software allows me to view the collected data and search an associated library to identify peaks but the selection of peaks and the identification process is all manual. It has no facility for automatically identifying more subtle chromatographic features such as small peaks coeluting beneath larger ones. Therefore not only would it take me a lot longer to get the same information by processing these data files manually, I would inevitably fail to spot features. As the concentration of a metabolite- and concsequently the size of the peak it produces in the chromatogram are not necessarily related to their biological function this means I could be missing all the important data. 

So AMDIS is a wonderful tool. It is, sadly, flawed too. The Xcalibur software uses a NIST library of reference spectra containing over 200,000 spectra. This massive amount of data makes it very good for identifying peaks. Oddly, AMDIS, which is written and maintained by the same people who curate the reference library, cannot use that library. Fortunately NIST publish a file converter program which allows you to convert your NIST libraries into a format AMDIS can access. However, AMDIS still cannot cope with a library larger then about 25,000 spectra and throws an error if you try to analyse data with a larger one. This is only about 12.5% of the full NIST library so, unless you’re prepared to analyse all of your samples eight times, it makes it unusable!

The solution seems to be for researchers conducting metabolite profiling to build their own reference libraries for AMDIS to use based on the derivatisation and analysis of pure standard compounds. Derivatisation is the process of chemically modifying metabolites which would otherwise be non-volatile to make them volatile and, therefore, amenable to gas chromatographic separation. The two most common ways of derivatising metabolites are to add a methyl or trimethylsiloxy groups by an ester linkage by treatment with one of a diverse range of derivatising reagents. The reagent I currently use is methyl chloroformate, which, in the presence of methanol and pyridine, converts amine groups  into carbamates and carboxyl groups into methyl esters. It does not react with alcohols so it does not allow me to analyse sugars, for example, which require trimethylsilation. 

The practical upshot of this is that some of the molecules which are produced by methyl chloroformate derivatisation are not encountered in nature or in industry. As a result there might not be reference spectra for that molecule in the NIST library, making their identification impossible without some expert-level interpretation of their spectra. Even this will not give a definitive answer but may suggest an identity which will have to then be confirmed by derivatisation and analysis of a pure standard. 

Fortunately the potential of GC-MS metabolite profiling is such that several research groups have invested the time and effort over the last ten years or so to build their own reference libraries for use with AMDIS. Two notable examples are the Golm Metabolome Database mass spectrum library [GMD] and Silas Villas-Boas’ group at Auckland University. Silas published a library in the supporting information of a paper describing the application of methyl chloroformate derivatisation for metabolite profiling. I have been using their libraries, as well as bits of the NIST library to try and assign identities to peaks in my own samples using AMDIS. The version of the GMD library I downloaded contains 2,594 spectra whereas the SVB library contains 223.SVB library gave me 125 hits from an MCF-derivatised sample of human urine and the GMD library gave me only 5. There are several reasons for this discrepancy: firstly the GMD library was composed for plant metabolite profiling and not cells, as was the SVB library. Secondly, the GMB library contains TMS derivatives as well as MCF derivatives and so not all of those 2,594 are applicable to my samples. Thirdly, I don’t know what instrument the data for the GMD library was collected on but I know that Silas’ group over the road at the University of Auckland use a Thermo Finnigan GC-MS that is very similar to mine so their reference spectra are going to be very similar. This is not the case with two quite different GC-MSs, which can produce quite different spectra. 

I’ve tried analysing the same data using 25,000 spectra

chunks of the NIST library and I’ve found some very dubious results which I’m pretty sure aren’t in my urine. Such as methamphetamine-d5! This is an excellent example of the problem with this type of analysis: if you search a large enough number of reference spectra you will always end up with “false positives”. This is pointed out in a paper I read today which presents an alternative to AMDIS for processing GC-MS metabolite profiling data and that was also produced by Silas Villas-Boas’ group at the UoA: Identifying and quantifying metabolites by scoring peaks of GC-MS data. So tomorrow I will be mostly playing with R and trying to get some example analysis done using the tools presented in this paper. Hopefully no deuterated methamphetamine will be involved!