NIST format libraries contain a lot of valuable information that would be even more useful if it could be accessed by other mass spectrometry software, such as MS-DIAL and AMDIS. I have been told that the NIST MS Search Program has an API which allows you to pass spectra to it and obtain results programmatically. This is evident in proprietary software, such as Agilent MassHunter and Thermo Freestyle. I understand you can email the NIST people and they will send you details of the API. However, open source software, such as MS-DIAL don’t seem to have implemented that capability. Therefore it is necessary to use other solutions to convert the information encoded in the NIST library into a format accessible by other software.
Public mass spectral libraries are typically encoded in several open formats, such as .mgf, .msp and .mzML formats. These formats are typically encoded as text and so are readily accessible. MS-DIAL uses .msp format libraries, for example. A description of the .msp text format can be found in this pdf (page 47). The lib2nist program included with the NIST library will allow you to convert the NIST library and similarly encoded data, such as the Wiley Registry, into these accessible formats. I have done this with both EI and HRMSMS libraries from NIST. Using lib2nist you can export each library to a series of .msp text files. You can’t export the whole library in one go so you have to limit the export to a certain number of records at a time. I found that 250K works. Once you’ve exported all your spectra from NIST you can recombine the files using a large text file editor, such as Vim. If you export the NIST high resolution MS/MS library the .msp file is nearly 3GB so there’s quite a bit of text.
The MS-DIAL team provide their own spectral libraries in .msp format incorporating all the available public libraries, such as those from mass spectral repositories, MassBank and GNPS, as well as several libraries kindly shared by individual research groups. If you open these .msp in a text editor and inspect the content you can see that the keys for the fields in the MS-DIAL libraries vary from those exported by lib2nist. The table below shows how they differ.
|Num peaks||Num Peaks|
Firstly, there’s a lot more fields in the NIST library export than the MS-DIAL library .msp file. Secondly, the fields are in lower case, with capitalised first letters and words separated by underscores. The exceptions are PrecursorMZ and Num peaks, neither of which get an underscore. There’s no obvious reason why.
The MS-DIAL fields aren’t consistent either, with a mixture of capitalised field names without spaces or underscores separating words, and words in lower case, with Num peaks matching the NIST format. The MS-DIAL documentation claims that case of the field text doesn’t matter. I’m guessing the underscores do as when I try importing the .msp exported by lib2nist I get no annotations in MS-DIAL. I wrote a Python script to substitute the MS-DIAL format keys for those in the lib2nist export. However, this didn’t work either. I then stripped out all of the fields from the NIST export that weren’t present in the MS-DIAL library. This did work and I got annotations from MS-DIAL!
For the final icing on this cake you can combine the MS-DIAL .msp library with your NIST export to get annotations from both.
This procedure will not work for the NIST mainlib and replib EI libraries. See the follow-up post for instructions on how to convert them.