The Global Proteome Machine Organization

The Global Proteome Machine Organization

   News Archive
Data set of the week: (2012/01/15)
Deep proteome and transcriptome mapping of a human cancer cell line.
Overall rating:
This data set consisted of 164 experiments from multidimensional LC/MS/MS runs. The data was published by Nagaraj N, Wisniewski JR, Geiger T, Cox J, Kircher M, Kelso J, Pääbo S, and Mann M. in Mol Syst Biol. 2011 7:548 (PubMed).
This data set is an extensive investigation of how many peptides can be identified from the limited proteome of a single human cell line using a combination of straight-forward LC/MS/MS methods, multidimensional chromatography and multiple proteases, adding in high resolution MS/MS via HCD, and doing careful, consistently state-of-the-art lab work. For the large number of groups that use HeLa cells, this work should serve as a reference for what can be seen and what sort of experiment should be done to see it. For anyone interested in bioinformatics and algorithm development, the scale (> 200,000 protein identifications) and precision of the work makes it an excellent example for trying out new ideas. It is also an excellent raw data set to find novel post-translational modifications, splice variants, viral contaminants and amino acid polymorphisms.
Data set of the week: (2012/01/08)
iPRG-2011: Study Materials for Identification of Electron Transfer Dissociation (ETD) Mass Spectra.
Overall rating:
This data set consisted of 1 SCX fraction LCMS/MS run on a Thermo Orbitrap-LTQ hybrid instrument. The data was made available on TRANCHE by the ABRF iPRG group Robert J Chalkley, Nuno Bandeira, John Cottrell, Eric Deutsch, Eugene A. Kapp, Henry H. Lam, W. Hayes McDonald and Thomas Neubert and has been described on the iPRG web site.
This rather oddball dataset provides more insight into the "chilli-cook-off" mentality associated with evaluating bioinformatics algorithms than it does into the current real-world problems in biomedical research. Tests of this sort can be useful when their goals are to provide feedback to algorithm & user interface designers and to inform users of the characteristics of algorithm performance. It is questionable as to whether any of such aims were achieved by analyzing this data set.
The data was artificially removed from context (only one of 21 SCX fractions was made available). The sample preparation methods used generated very high levels of non-enzymatic cleavage (22% of observable peptides), unusually high levels of asparagine deamination (48% of N-containing peptides) and peptide N-terminal glutamine cyclization (88% of peptides with an N-terminal Q). The mass measurements had large parent ion and fragment ion systematic errors (+5 ppm and -0.25 Da respectively) and standard deviations (4 ppm and 0.3 Da). The proteins in the sample were heavily skewed towards the cytosolic proteins and the added human sequence standard proteins (Sigma UPS). The lack of the other 20 fractions made it impossible to draw any conclusions about the relative observability of the added UPS proteins (and the ribosomal E. coli protein contaminants in the UPS preparation). It was very unclear why such a complex, poorly controlled sample/measurement combination was used to test algorithms and so little information about the true character of the sample was provided to the participating groups. This hidden complexity resulted in more of an examination of the detective abilities of the groups than a useful test of the algorithms.
New Editions of the Human and Mouse Proteome Guides Released (2012.01.03)
The latest edition (2012.01.01) of both the GPM Homo sapiens and Mus musculus Proteome Guides have been been made available. The Guides are the results of an automated curation of the >200 million human and >50 million mouse peptide identifications in GPMDB. The Guides use ENSEMBL v. 62 protein sequences and their chromosome coordinates are aligned to the human GRCh37 genome and mouse NCBIM37 genome builds, respectively. The Guides are available either as spreadsheets or in HTML format and they may be downloaded either from the links above or the GPM Annotation Project ftp server.
Data set of the week: (2012/01/01)
Proteomic Analysis of a Pleistocene Mammoth Femur Reveals More than One Hundred Ancient Bone Proteins.
Overall rating:
This data set consisted of 4 data sets constructed from several different types of experiment. The data was published by Cappellini E, Jensen LJ, Szklarczyk D, Ginolhac A, da Fonseca RA, Stafford TW, Holen SR, Collins MJ, Orlando L, Willerslev E, Gilbert MT, and Olsen JV. in J Proteome Res. 2011 Dec 14 (PubMed).
This data was a truly amazing example of what can be obtained using samples that have simply sat around outside for 43,000 years. The preservation of the detectable peptides was unexpectedly good. The experiments were state-of-the-art at all levels and the data should be examined extensively by any group interested in detecting amino acid polymorphisms associated with evolutionary change. The analysis in the original paper was correct at the top level (the proteins detected) but was less well done at the level of amino acid polymorphisms and side chain modifications. There are several more publications' worth of information in this extraordinary data.
Copyright © 2011, The Global Proteome Machine Organization