|
The X! Hunter Annotated Spectrum Library (ASL) system uses a binary format to
store the spectra and annotations. This format was designed to make loading
the data from the libraries as fast as possible. The structure of this binary
format and all of the required data fields are specified below. All storage
is in little-endian format.
The initial release of this file format used only the first 4 bytes of the file for
"header" information. In this release, the first 256 bytes are reserved
for header information. The format of this header is as follows:
- 4-byte int: all 4 bytes = 0x00;
- 4-byte unsigned int: number of spectra in file; and
- 248-bytes char array: unassigned (may be 0x00).
The annotation and spectra
are stored sequentially, as in the previous format. The median value of the
spectrum set used to construct any library entry is now included in the
file, using the following format:
- 8-byte double: parent ion M+H (Daltons);
- 4-byte int: parent ion charge;
- 4-byte float: sum of the squares of the fragment ion intensities;
- 4-byte float: median expectation value of spectra;
- 4-byte int: length of the peptide sequence, L;
- L-byte char array: peptide sequence;
- 4-byte int: number of spectrum intensity-m/z pairs, P;
- P-byte unsigned char array: spectrum intensities;
- P*4-byte float array: spectrum m/z values;
- 4-byte int: number of sequence modifications, M;
- M modification objects, each containing:
- 4-byte int: modification sequence position;
- 8-byte double: modification mass.
- 4-byte int: number of protein sequences containing the peptide, N;
- N protein objects, each containing:
- 4-byte int: length of protein sequence accession string, S;
- S-byte char array: protein sequence accession string;
- 4-byte int: position of peptide in protein sequence;
- Repeat until all T spectra loaded.
NOTES:
- The spectra are not stored in any particular order: spectra associated with the
same protein may be located anywhere within the file.
- Annotations are based on sequence accession numbers for particular sequence collections,
e.g., ENSEMBL, IPI or SWISS-PROT protein accession numbers.
- X! Hunter ASLs store the twenty (20) most intense peaks for a particular MS/MS spectrum.
- Parent ion masses are calculated based on the mono-isotopic masses of the peptide residues.
|