|
The Global Proteome Machine Organization |
|
News Archive
Data set of the week: (2012/01/15)
Deep proteome and transcriptome mapping of a human cancer cell line. Overall rating:
This data set consisted of 164 experiments
from multidimensional LC/MS/MS runs.
The data was published by
Nagaraj N, Wisniewski JR, Geiger T, Cox J, Kircher M, Kelso J, Pääbo S, and Mann M. in
Mol Syst Biol. 2011 7:548 (PubMed).
This data set is an extensive investigation of how many peptides can be identified from the limited proteome of a
single human cell line using a combination of straight-forward LC/MS/MS
methods, multidimensional chromatography and multiple proteases, adding in high resolution MS/MS via HCD, and doing careful,
consistently state-of-the-art lab work. For the large number of groups that use HeLa cells, this work should serve
as a reference for what can be seen and what sort of experiment should be done to see it. For anyone interested in bioinformatics
and algorithm development, the scale (> 200,000 protein identifications) and precision of the work makes it an excellent
example for trying out new ideas. It is also an excellent raw data set to find novel post-translational modifications, splice
variants, viral contaminants and amino acid polymorphisms.
Data set of the week: (2012/01/08)
iPRG-2011: Study Materials for Identification of Electron Transfer Dissociation (ETD) Mass Spectra. Overall rating:
This data set consisted of 1 SCX fraction
LCMS/MS run on a Thermo Orbitrap-LTQ hybrid instrument.
The data was made available on TRANCHE by the ABRF iPRG group
Robert J Chalkley, Nuno Bandeira, John Cottrell, Eric Deutsch, Eugene A. Kapp, Henry H. Lam, W. Hayes McDonald and Thomas Neubert
and has been described on the iPRG web site.
This rather oddball dataset provides more insight into the "chilli-cook-off" mentality associated with
evaluating bioinformatics algorithms than it does into the current real-world problems in biomedical research.
Tests of this sort can be useful when their goals are to provide feedback
to algorithm & user interface designers and to inform users of the characteristics of algorithm performance.
It is questionable as to whether any of such aims were achieved by analyzing this data set.
The data was artificially removed from context (only one of 21 SCX fractions was made available). The
sample preparation methods used generated very high levels of non-enzymatic cleavage (22% of observable peptides),
unusually high levels of asparagine deamination (48% of N-containing peptides) and peptide N-terminal glutamine
cyclization (88% of peptides with an N-terminal Q). The mass measurements had large parent ion and fragment
ion systematic errors (+5 ppm and -0.25 Da respectively) and standard deviations (4 ppm and 0.3 Da). The proteins
in the sample were heavily skewed towards the cytosolic proteins and the added human sequence standard proteins (Sigma UPS). The
lack of the other 20 fractions made it impossible to draw any conclusions about the relative observability of
the added UPS proteins (and the ribosomal E. coli protein contaminants in the UPS preparation). It was
very unclear why such a complex, poorly controlled sample/measurement combination was used to test
algorithms and so little information about the true character of the sample was provided to the participating
groups. This hidden complexity resulted in more of an examination of the detective abilities of the groups than a
useful test of the algorithms.
The latest edition (2012.01.01) of both the GPM Homo sapiens and
Mus musculus Proteome Guides have been been made available. The Guides
are the results of an automated curation of the >200 million human and >50 million mouse peptide identifications in
GPMDB. The Guides use ENSEMBL v. 62 protein sequences and their chromosome coordinates
are aligned to the human GRCh37 genome and mouse NCBIM37 genome builds, respectively. The Guides are available either as spreadsheets or in HTML format and
they may be downloaded either from the links above or the GPM Annotation Project ftp server.
Data set of the week: (2012/01/01)
Proteomic Analysis of a Pleistocene Mammoth Femur Reveals More than One Hundred Ancient Bone Proteins. Overall rating:
This data set consisted of 4 data sets
constructed from several different types of experiment.
The data was published by
Cappellini E, Jensen LJ, Szklarczyk D, Ginolhac A, da Fonseca RA, Stafford TW, Holen SR, Collins MJ, Orlando L, Willerslev E, Gilbert MT, and Olsen JV. in
J Proteome Res. 2011 Dec 14 (PubMed).
This data was a truly amazing example of what can be obtained using samples that have
simply sat around outside for 43,000 years. The preservation of the detectable peptides was unexpectedly good.
The experiments were state-of-the-art at all levels and the data should be examined extensively by any
group interested in detecting amino acid polymorphisms associated with evolutionary change. The
analysis in the original paper was correct at the top level (the proteins detected) but was less well done at the level of
amino acid polymorphisms and side chain modifications. There are several more publications' worth of information
in this extraordinary data.
Copyright © 2011, The Global Proteome Machine Organization
|