AMDIS and metabolite profiling by GC-MS

AMDIS, the Automated Mass spectral Deconvolution and Identification System, is a piece of freeware produced by NIST in the USA. In combination with an appropriate library of reference spectra the software identifies features in your GC-MS data and compares them to library spectra in order to assigns identities to them. This is fantastically useful for metabolite profiling, or metabolomics, as the data typically contain hundreds of features, each feature being a unique molecular structure present within your samples. It is possible to achieve similar results using other GC-MS software and associated libraries but none of them seem to be able to automate the process in the way that AMDIS does. That’s because AMDIS also has a batch process function in which you can select a whole set of data files and the software will churn through them, producing a single summary file of data. I have a suspicion that MassHunter might have a similar capability but I have never used it so I cannot comment further. We have a Thermo Finnigan GC-MS so I am using their Xcalibur software to acquire the data. The Xcalibur browser software allows me to view the collected data and search an associated library to identify peaks but the selection of peaks and the identification process is all manual. It has no facility for automatically identifying more subtle chromatographic features such as small peaks coeluting beneath larger ones. Therefore not only would it take me a lot longer to get the same information by processing these data files manually, I would inevitably fail to spot features. As the concentration of a metabolite- and concsequently the size of the peak it produces in the chromatogram are not necessarily related to their biological function this means I could be missing all the important data. 

So AMDIS is a wonderful tool. It is, sadly, flawed too. The Xcalibur software uses a NIST library of reference spectra containing over 200,000 spectra. This massive amount of data makes it very good for identifying peaks. Oddly, AMDIS, which is written and maintained by the same people who curate the reference library, cannot use that library. Fortunately NIST publish a file converter program which allows you to convert your NIST libraries into a format AMDIS can access. However, AMDIS still cannot cope with a library larger then about 25,000 spectra and throws an error if you try to analyse data with a larger one. This is only about 12.5% of the full NIST library so, unless you’re prepared to analyse all of your samples eight times, it makes it unusable!

The solution seems to be for researchers conducting metabolite profiling to build their own reference libraries for AMDIS to use based on the derivatisation and analysis of pure standard compounds. Derivatisation is the process of chemically modifying metabolites which would otherwise be non-volatile to make them volatile and, therefore, amenable to gas chromatographic separation. The two most common ways of derivatising metabolites are to add a methyl or trimethylsiloxy groups by an ester linkage by treatment with one of a diverse range of derivatising reagents. The reagent I currently use is methyl chloroformate, which, in the presence of methanol and pyridine, converts amine groups  into carbamates and carboxyl groups into methyl esters. It does not react with alcohols so it does not allow me to analyse sugars, for example, which require trimethylsilation. 

The practical upshot of this is that some of the molecules which are produced by methyl chloroformate derivatisation are not encountered in nature or in industry. As a result there might not be reference spectra for that molecule in the NIST library, making their identification impossible without some expert-level interpretation of their spectra. Even this will not give a definitive answer but may suggest an identity which will have to then be confirmed by derivatisation and analysis of a pure standard. 

Fortunately the potential of GC-MS metabolite profiling is such that several research groups have invested the time and effort over the last ten years or so to build their own reference libraries for use with AMDIS. Two notable examples are the Golm Metabolome Database mass spectrum library [GMD] and Silas Villas-Boas’ group at Auckland University. Silas published a library in the supporting information of a paper describing the application of methyl chloroformate derivatisation for metabolite profiling. I have been using their libraries, as well as bits of the NIST library to try and assign identities to peaks in my own samples using AMDIS. The version of the GMD library I downloaded contains 2,594 spectra whereas the SVB library contains 223.SVB library gave me 125 hits from an MCF-derivatised sample of human urine and the GMD library gave me only 5. There are several reasons for this discrepancy: firstly the GMD library was composed for plant metabolite profiling and not cells, as was the SVB library. Secondly, the GMB library contains TMS derivatives as well as MCF derivatives and so not all of those 2,594 are applicable to my samples. Thirdly, I don’t know what instrument the data for the GMD library was collected on but I know that Silas’ group over the road at the University of Auckland use a Thermo Finnigan GC-MS that is very similar to mine so their reference spectra are going to be very similar. This is not the case with two quite different GC-MSs, which can produce quite different spectra. 

I’ve tried analysing the same data using 25,000 spectra

chunks of the NIST library and I’ve found some very dubious results which I’m pretty sure aren’t in my urine. Such as methamphetamine-d5! This is an excellent example of the problem with this type of analysis: if you search a large enough number of reference spectra you will always end up with “false positives”. This is pointed out in a paper I read today which presents an alternative to AMDIS for processing GC-MS metabolite profiling data and that was also produced by Silas Villas-Boas’ group at the UoA: Identifying and quantifying metabolites by scoring peaks of GC-MS data. So tomorrow I will be mostly playing with R and trying to get some example analysis done using the tools presented in this paper. Hopefully no deuterated methamphetamine will be involved! 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s