The Global Proteome Machine Organization
The Global Proteome Machine
The home of proteomics crowd-sourced "Big Data"
   GPMDB Data set of the week
The GPMDB contains tens of thousands of data sets contributed by researchers around the world. Every week, we select a data set because of its technical excellence, biological interest or simply because we think it is of general interest to the proteomics community. For an explanation of what the stars mean, click here:
   By year posted
Data set of the week: (2012/05/20)
Correct interpretation of comprehensive phosphorylation dynamics requires normalization by protein expression changes.
Overall rating: very good data (general interest)
This data set consisted of 15 result files from several phospho-peptide enrichment/multidimensional chromatography experiments. It was published by Wu R, Dephoure N, Haas W, Huttlin EL, Zhai B, Sowa ME, and Gygi SP in Mol Cell Proteomics. 2011 10:M111.009654 (PubMed).
The data and experiments reported in this paper are part of a general shift in attitude towards the detection of phosphorylated domains in proteins. Most of the work in the previous decade has placed considerable emphasis on the technical aspects of identifying phosphopeptides and the qualitative reporting of their observation. This work (and that of others) is now focused on how to interpret the observation of phosphorylated protein domains in the context of a cell's biological function. The experiments performed here were well done, resulting in a nice set of protein and peptide identifications of the phosphoproteins involved in yeast metabolism.
Data set of the week: (2012/05/13)
Metabolic switches and adaptations deduced from the proteomes of Streptomyces coelicolor wild type and phoP mutant grown in batch culture.
Overall rating: very good data (specialist interest)
This data set consisted of 32 LC/MS/MS experiments that were made available in mzData files via PRIDE. It was published by Thomas L, Hodgson DA, Wentzel A, Nieselt K, Ellingsen TE, Moore J, Morrissey ER, Legaie R; STREAM Consortium, Wohlleben W, Rodríguez-García A, Martín JF, Burroughs NJ, Wellington EM, and Smith MC in Mol Cell Proteomics. 2012 Feb;11(2):M111.013797 (PubMed).
These experiments give a good view into changes to the relative concentrations of many metabolic enzymes in the environmental bacterium S. coelicolor in response to changes in phosphate-containing nutrient levels. On the whole the experiments were well done, although there was significant, reproduced supression of early eluting peptides in all of the LC/MS/MS runs. This supression may have made the experiments insensitive to some particular enzymes. However, for enzymes containing observable peptides with gradient elutions > 20% acetonitrile, the relative protein regulatory responses in could be inferred with reasonable accuracy from this data set.
Data set of the week: (2012/05/07)
Cells lacking β-actin are genetically reprogrammed and maintain conditional migratory capacity.
Overall rating: very good data (general interest)
This data set consisted of 2 LC/MS/MS experiments that were made available in mzData files via PRIDE. It was published by Tondeleir D, Lambrechts A, Mueller M, Jonckheere V, Doll T, Vandamme D, Bakkali K, Waterschoot D, Lemaistre M, Debeir O, Decaestecker C, Hinz B, Staes A, Timmerman E, Colaert N, Gevaert K, Vandekerckhove J, and Ampe C in Mol Cell Proteomics. 2012 Mar 22 (PubMed).
In this study, the authors use an unusual combination of SILAC relative quantitation and combined fractional diagonal chromatography (COFRADIC) to study what happens to mouse embryonic fibroblast cells when then lack an important cytoskeletal protein. Rather than the typical SILAC experiment in which heavy lysine and arginine residues are used, this experimental design uses heavy methionine and COFRADIC to produce fractions enriched in peptides containing oxidized methionine residues. While the use of an affinity technique has the potential to complicate quantitative experiments, these experiments seem to have worked out quite well and generated some valuable insights into the metabolic creativity shown by the fibroblasts in the face of what might seem to be an insurmountable challenge.
Data set of the week: (2012/04/29)
Kinome analysis of receptor-induced phosphorylation in human natural killer cells.
Overall rating: very good data (general interest)
This data set consisted of 3 LC/MS/MS experiments, that were made available in the form of Mascot "DAT" files via TRANCHE. It was published by König S, Nimtz M, Scheiter M, Ljunggren HG, Bryceson YT, and Jänsch L. in PLoS One. 2012 7:e29672 (PubMed).
The results presented in this study make very good use of high accuracy mass measurements of both parent and fragment ion for their biological application — determining phosphorylation changes in natural killer (NK) cells caused by changes in receptor stimulation. These cytotoxic leucocytes are known to have kinome changes associated with such stimulation, but the phosphorylation domain changes associated with specific stimulations have not been fully explored. This paper makes a start in this type of interesting, cell-specific investigation that makes use of clinically-derived cells for kinome study.
Data set of the week: (2012/04/22)
Quantification of mRNA and protein and integration with protein turnover in a bacterium.
Overall rating: very good data (specialist interest)
This data set consisted of 42 LC/MS/MS runs from single dimension chromatography experiments. It was published by Maier T, Schmidt A, Güell M, Kühner S, Gavin AC, Aebersold R, and Serrano L. in Mol Syst Biol 2011 7:511 (PubMed).
The data in these experiments give a good example of a straightforward analysis of the relationship between protein and mRNA concentrations in a clinically important model organism, Mycoplasma pneumoniae. The results also provide the best insights into the proteome of this prokaryote currently available, which has not be thoroughly studied even though it has a comparatively simple genome and it is one of the primary causes of atypical bacterial pneumonia. The reproducibility of this data was somewhat compromised by the consistent bias against early eluting peptides in the HPLC runs — very few peptides that would be expected to elute at < 15% acetonitrile were observed.
Data set of the week: (2012/04/15)
Proteomic and phosphoproteomic comparison of human ES and iPS cells.
Overall rating: very good data (general interest)
This data set consisted of 88 LC/MS/MS runs from multiple-dimensional chromatography experiments. It was published by Phanstiel DH, Brumbaugh J, Wenger CD, Tian S, Probasco MD, Bailey DJ, Swaney DL, Tervo MA, Bolin JM, Ruotti V, Stewart R, Thomson JA, and Coon JJ in Nat Methods 2011 8:821-7 (PubMed).
The results here were a good representation of the proteins and phosphorylated domains that could be readily sampled in human embryonic stem cells and induced pluripotent stem cells. The techniques used were well described and the measurements were in general very good. The studies were performed using a dual-cell quadrupole linear ion trap-orbitrap hybrid mass spectrometer (dcQLT-Orbitrap), which produced high resolution, high accuracy parent and fragment ion measurements. The data was made available through the authors' lab database site, the Stem Cell-Omics Repository (SCOR).
Data set of the week: (2012/04/08)
Comparison of proteomic and transcriptomic profiles in the bronchial airway epithelium of current and never smokers.
Overall rating: excellent data (worth study)
This data set consisted of 589 LC/MS/MS runs of 1D SDS-PAGE gel bands and experimental summaries. The data was published by Steiling K, Kadar AY, Bergerat A, Flanigon J, Sridhar S, Shah V, Ahmad QR, Brody JS, Lenburg ME, Steffen M, and Spira A in PLoS One. 2009 4:e5043 (PubMed).
This excellent study contrasted the proteomes of non- and current-smokers in a very relevant tissue, bronchial airway epithelium. The results remain the definitive proteome in this clinical tissue and contains some of the best observations for a number of rarely observed proteins, such as TPPP3 (tubulin polymerization-promoting protein family member 3), SPATA18 (spermatogenesis associated 18 homolog), ODF3B (outer dense fiber of sperm tails 3B), SPA17 (sperm autoantigenic protein 17) and ENSP00000387851 (member of the ciliary rootlet coiled-coil family).
Data set of the week: (2012/04/1)
The matrisome: in silico definition and in vivo characterization by proteomics of normal and tumor extracellular matrices.
Overall rating: very good data (specialist interest)
This data set consisted of 98 LC/MS/MS runs and experimental summaries. The data was published by Naba A, Clauser KR, Hoersch S, Liu H, Carr SA, and Hynes RO in Mol Cell Proteomics 2011 mcp.M111.014647 (PubMed).
The idea behind collecting this data set was to define which proteins compose the extracellular matrix and to discover which proteins would be contributed to the extracellular matrix by the host in a xenograft experiment. The results do a good job of determining the protein complement of this material in human tissue. The xenograft experiment — growing human-source tumours in live mice — clearly shows that both the tumour cells and mouse host tissue contribute to the proteins in the tumour-associated matrix. The value of the data was somewhat reduced by the relatively large number of detectable chemical artifacts, particularly the carbamylation and carbamidomethylation of peptide N-terminii and lysine sidechains.
Data set of the week: (2012/03/26)
Investigating the macropinocytic proteome of Dictyostelium amoebae by high-resolution mass spectrometry.
Overall rating: very good data (general interest)
This data set consisted of one large LC/MS/MS run. The data was published by Journet A, Klein G, Brugière S, Vandenbrouck Y, Chapel A, Kieffer S, Bruley C, Masselon C, and Aubry L in Proteomics. 2012 12:241-5 (PubMed).
Dictyostelium discoideum is one of the more peculiar organisms used in research. It is a free-living "slime mold", commonly found in leaf litter on any temperate forest floor. In this study the authors have characterized the proteins involved in the unusual method that the amoeboid form of this organism uses to take in nutrients from the environment: macropinocytosis. The experimental methods used were very well done and the results significantly extend what is known about both this process and the organism itself.
Data set of the week: (2012/03/18)
Proteogenomic analysis of Candida glabrata using high resolution mass spectrometry.
Overall rating: excellent data (worth study)
This data set consisted of 70 LC/MS/MS using both SDS PAGE protein and SCX peptide separation techniques. The data was published by Prasad TS, Harsha HC, Keerthikumar S, Sekhar NR, Selvan LD, Kumar P, Pinto SM, Muthusamy B, Subbannayya Y, Renuse S, Chaerkady R, Mathur PP, Ravikumar R, and Pandey A in J Proteome Res. 2012 11:247-60 (PubMed).
Candida glabrata is a haploid yeast (a.k.a., Torulopsis glabrata). It was long thought to be a human commensal organism, but it has been shown to cause pathogenic infections in immune-compromised individuals. This study of the organism's proteome, performed using FTMS with high resolution for both the parent and fragment ions, provides a nice insight into the observable proteome of this poorly studied species. It also provides an excellent set of data to compare with an existing (but relatively untested) genome sequence to discover novel genes, understand the extent of amino acid polymorphisms and compare the post-translational modification of domains with other, better studied, yeast species.
Data set of the week: (2012/03/11)
The ethylmalonyl-CoA pathway is used in place of the glyoxylate cycle by Methylobacterium extorquens AM1 during growth on acetate.
Overall rating: excellent data (worth study)
This data set consisted of 6 LC/MS/MS runs from whole cell lysates of the organism grown under specific conditions. The data was published by Schneider K, Peyraud R, Kiefer P, Christen P, Delmotte N, Massou S, Portais JC, and Vorholt JA in J Biol Chem. 2012 287:757-66 (PubMed).
This study effectively defined the observable proteome of Methylobacterium extorquens, a Gram-negative bacterium that lives on plant leaves (click here for an amusing short presentation on this organism). Even though the title of the study suggests that the study may have limited scope, each LC/MS/MS run generated identifications for ~40% of the proteins coded in the complete genome. The analysis presented in GPMDB used the proteomes from three stains of the organism — AM1, DM4 and PA1 — to be sure that no genes were absent because of errors in the specific genome assembly of an individual strain. This analysis showed that the AM1 strain assembly was very good, with only a small number of proteins from the PA1 and DM4 proteomes found without corresponding AM1 orthologs.
Data set of the week: (2012/03/04)
Comparative proteomic analysis of eleven common cell lines reveals ubiquitous but varying expression of most proteins.
Overall rating: very good data (general interest)
This data set consisted of 181 LC/MS/MS runs from lysates of 11 different laboratory cell lines. The data was published by Geiger T, Wehner A, Schaab C, Cox J, and Mann M in Mol Cell Proteomics 2012 Jan 25 (PubMed).
If you ever wanted to know what proteins were readily observable in A549, GAMG, HEK293, HeLa, HepG2, K562, MCF7, RKO, U2OS, Jurkat, HEK293, LnCap, HeLa or K562 cells, this is the data set for you. It is probably the largest single data set generated for a publication using the current generation of Orbitrap technology. The experiments were done using HCD fragmentation and consistent chromatographic and sample preparation methods. The information is a good compliment to the earlier DSOTW Initial characterization of the human central proteome where there is overlapping information generated with conventional CID.
Data set of the week: (2012/02/26)
Systematic phosphorylation analysis of human mitotic protein complexes.
Overall rating: very good data (general interest)
This data set consisted of 213 LC/MS/MS affinity purification experiments. The data was published by Hegemann B, Hutchins JR, Hudecz O, Novatchkova M, Rameseder J, Sykora MM, Liu S, Mazanek M, Lénárt P, Hériché JK, Poser I, Kraut N, Hyman AA, Yaffe MB, Mechtler K, and Peters JM in Sci Signal. 2011 4:rs12. (PubMed).
These results were good examples of the use of proteomics to target an aspect of a particular cell process, in this case the role of phosphorylation in mitosis. The experimental protocols do a good job of isolating the relavent proteins and generating easily interpretted phophopeptide spectra. The chromatography and mass spectrometry were very well done and consistent across the data set. An unusual feature of this data set was the presence of relatively strong signals from the protease domain (picornain 3C) of the human rhinovirus B-14 polyprotein. While it is known that HeLa cells are susceptible to rhinovirus (common cold) infections, this data may be the first experimental confirmation of a rhinovirus infection in cell culture based on proteomics methods.
Data set of the week: (2012/02/19)
The quantitative proteome of a human cell line.
Overall rating: very good data (specialist interest)
This data set consisted of 59 LC/MS/MS runs from U2-OS cell lysates. The data was published by Beck M, Schmidt A, Malmstroem J, Claassen M, Ori A, Szymborska A, Herzog F, Rinner O, Ellenberg J, and Aebersold R. in Mol Syst Biol. 2011 7:549 (PubMed).
This study provides a large set of consistently good quality, journeyman data focussed on creating a catalog of proteins present in a common cell line. The U2-OS line was derived from a female sarcoma with very few normal chromosomes and hypertriploid chromosome counts. The cell culture used appears to have relatively clean, with little if any evidence of the presence of viruses or Mycoplasma. Any group interested in quantifying unlabelled proteomics data, investigating rare post-translational modifications or developing quality control metrics should take a look at this data.
Data set of the week: (2012/02/12)
Comprehensive proteomic analysis of human bile.
Overall rating: excellent data (worth study)
This data set consisted of 37 LC/MS/MS runs and summaries, from multidimensional chromatography experiments. The data was published by Barbhuiya MA, Sahasrabuddhe NA, Pinto SM, Muthusamy B, Singh TD, Nanjappa V, Keerthikumar S, Delanghe B, Harsha HC, Chaerkady R, Jalaj V, Gupta S, Shrivastav BR, Tiwari PK, and Pandey A. in Proteomics. 2011 Dec;11(23):4443-53 (PubMed).
This series of multidimensional chromatography runs using high resolution MS and HCD MS/MS did exactly what the title said: it provides a comprehensive catalogue of the proteins and consistituent peptides that are to be expected when human bile is analyzed. It contains many best-to-date observations of proteins, even ones that are not normally associated with bile, such as hornerin and dermcidin. The methods used produced surprisingly good recovery of cysteine-containing peptides, which are often depleted in proteomics measurements.
Data set of the week: (2012/02/05)
Chemoproteomics profiling of HDAC inhibitors reveals selective targeting of HDAC complexes.
Overall rating: very good data (general interest)
This data set consisted of 128 experiments representing LC/MS/MS runs coupled with targeted affinity purification methods. The data was published by Bantscheff M, Hopf C, Savitski MM, Dittmann A, Grandi P, Michon AM, Schlegl J, Abraham Y, Becher I, Bergamini G, Boesche M, Delling M, Dümpelfeld B, Eberhard D, Huthmacher C, Mathieson T, Poeckel D, Reader V, Strunk K, Sweetman G, Kruse U, Neubauer G, Ramsden NG and Drewes G. in Nat Biotechnol. 2011 29:255-65 (PubMed).
The results demonstrate that the best way to find and quantitate relatively rare proteins is to utilize a targeted-affinity purification approach. The protocols described in the paper work very well and the measurements were well done. The peptide identification work in the paper was rather cursory, but that does not affect the biological conclusions or the validity of the approach.
Data set of the week: (2012/01/29)
Modularity and hormone sensitivity of the Drosophila melanogaster insulin receptor/target of rapamycin interaction proteome.
Overall rating: very good data (specialist interest)
This data set consisted of 138 experiments representing LC/MS/MS runs from individual affinity purification protocols. The data was published by Glatter T, Schittenhelm RB, Rinner O, Roguska K, Wepf A, Jünger MA, Köhler K, Jevtov I, Choi H, Schmidt A, Nesvizhskii AI, Stocker H, Hafen E, Aebersold R, and Gstaiger M. in Mol Syst Biol. 2011 7:547 (PubMed).
This study was a good example of the routine use of good quality proteomics technology to elucidate an interesting aspect of biology. It examined the protein-protein interactions associated with the InR/TOR pathway in the well-established Kc167 cell line. The measurements were unambigious, resulting in a significant number of indentifications of relatively rare D. melanogaster proteins involved in this pathway. It also contained a nice survey of the detectable SNAPs present in this cell line — fruit flies have a surprisingly large number of nsSNPs compared to mammal genomes.
Data set of the week: (2012/01/22)
Characterization of the Asia Oceania Human Proteome Organisation Membrane Proteomics Initiative Standard using SDS-PAGE shotgun proteomics.
Overall rating: very good data (general interest)
This data set consisted of 6 experiments from LC/MS/MS runs. The data was published by Peng L, Kapp EA, McLauchlan D, and Jordan TW. in Proteomics 2011 11:4376-84 (PubMed).
These experiments provide insight into how straightforward it has become to identify membrane proteins. Using a fairly simple sample preparation method and LC/MS/MS with an LTQ instrument, the results show that it is possible to easily identify large numbers of membrane proteins. It is still common for people to suggest that membrane proteins are "difficult" using proteomics techniques. These results show that they are really no more difficult than any other class of protein, so long as they can be kept in solution long enough to be digested.
Data set of the week: (2012/01/15)
Deep proteome and transcriptome mapping of a human cancer cell line.
Overall rating: excellent data (leading the field)
This data set consisted of 164 experiments from multidimensional LC/MS/MS runs. The data was published by Nagaraj N, Wisniewski JR, Geiger T, Cox J, Kircher M, Kelso J, Pääbo S, and Mann M. in Mol Syst Biol. 2011 7:548 (PubMed).
This data set is an extensive investigation of how many peptides can be identified from the limited proteome of a single human cell line using a combination of straight-forward LC/MS/MS methods, multidimensional chromatography and multiple proteases, adding in high resolution MS/MS via HCD, and doing careful, consistently state-of-the-art lab work. For the large number of groups that use HeLa cells, this work should serve as a reference for what can be seen and what sort of experiment should be done to see it. For anyone interested in bioinformatics and algorithm development, the scale (> 200,000 protein identifications) and precision of the work makes it an excellent example for trying out new ideas. It is also an excellent raw data set to find novel post-translational modifications, splice variants, viral contaminants and amino acid polymorphisms.
Data set of the week: (2012/01/08)
iPRG-2011: Study Materials for Identification of Electron Transfer Dissociation (ETD) Mass Spectra.
Overall rating: very good data (specialist interest)
This data set consisted of 1 SCX fraction LCMS/MS run on a Thermo Orbitrap-LTQ hybrid instrument. The data was made available on TRANCHE by the ABRF iPRG group Robert J Chalkley, Nuno Bandeira, John Cottrell, Eric Deutsch, Eugene A. Kapp, Henry H. Lam, W. Hayes McDonald and Thomas Neubert and has been described on the iPRG web site.
This rather oddball dataset provides more insight into the "chili-cook-off" mentality associated with evaluating bioinformatics algorithms than it does into the current real-world problems in biomedical research. Tests of this sort can be useful when their goals are to provide feedback to algorithm & user interface designers and to inform users of the characteristics of algorithm performance. It is questionable as to whether any of such aims were achieved by analyzing this data set.
The data was artificially removed from context (only one of 21 SCX fractions was made available). The sample preparation methods used generated very high levels of non-enzymatic cleavage (22% of observable peptides), unusually high levels of asparagine deamination (48% of N-containing peptides) and peptide N-terminal glutamine cyclization (88% of peptides with an N-terminal Q). The mass measurements had large parent ion and fragment ion systematic errors (+5 ppm and -0.25 Da respectively) and standard deviations (4 ppm and 0.3 Da). The proteins in the sample were heavily skewed towards the cytosolic proteins and the added human sequence standard proteins (Sigma UPS). The lack of the other 20 fractions made it impossible to draw any conclusions about the relative observability of the added UPS proteins (and the ribosomal E. coli protein contaminants in the UPS preparation). It was very unclear why such a complex, poorly controlled sample/measurement combination was used to test algorithms and so little information about the true character of the sample was provided to the participating groups. This hidden complexity resulted in more of an examination of the detective abilities of the groups than a useful test of the algorithms.
Data set of the week: (2012/01/01)
Proteomic Analysis of a Pleistocene Mammoth Femur Reveals More than One Hundred Ancient Bone Proteins.
Overall rating: excellent data (leading the field)
This data set consisted of 4 data sets constructed from several different types of experiment. The data was published by Cappellini E, Jensen LJ, Szklarczyk D, Ginolhac A, da Fonseca RA, Stafford TW, Holen SR, Collins MJ, Orlando L, Willerslev E, Gilbert MT, and Olsen JV. in J Proteome Res. 2011 Dec 14 (PubMed).
This data was a truly amazing example of what can be obtained using samples that have simply sat around outside for 43,000 years. The preservation of the detectable peptides was unexpectedly good. The experiments were state-of-the-art at all levels and the data should be examined extensively by any group interested in detecting amino acid polymorphisms associated with evolutionary change. The analysis in the original paper was correct at the top level (the proteins detected) but was less well done at the level of amino acid polymorphisms and side chain modifications. There are several more publications' worth of information in this extraordinary data.
Copyright © 2012, The Global Proteome Machine Organization. Privacy Statement