|
The Global Proteome Machine
The home of proteomics crowd-sourced "Big Data" |
|
GPMDB Data set of the week
The GPMDB contains tens of thousands of data sets contributed by researchers around the world.
Every week, we select a data set because of its technical excellence, biological interest
or simply because we think it is of general interest to the proteomics community. For
an explanation of what the stars mean, click here:
By year posted
Data set of the week: (2012/05/20)
Correct interpretation of comprehensive phosphorylation dynamics requires normalization by protein expression changes. Overall rating: very good data (general interest)
This data set consisted of 15 result files from several
phospho-peptide enrichment/multidimensional chromatography experiments.
It was published by
Wu R, Dephoure N, Haas W, Huttlin EL, Zhai B, Sowa ME, and Gygi SP in
Mol Cell Proteomics. 2011 10:M111.009654 (PubMed).
The data and experiments reported in this paper are part of a general
shift in attitude towards the detection of phosphorylated domains in proteins. Most of the work in
the previous decade has placed considerable emphasis on the technical aspects of identifying phosphopeptides
and the qualitative reporting of their observation. This work (and that of others) is now focused
on how to interpret the observation of phosphorylated protein domains in the context of a cell's
biological function. The experiments performed here were well done, resulting in a nice set of protein
and peptide identifications of the phosphoproteins involved in yeast metabolism.
Data set of the week: (2012/05/13)
Metabolic switches and adaptations deduced from the proteomes of Streptomyces coelicolor wild type and phoP mutant grown in batch culture. Overall rating: very good data (specialist interest)
This data set consisted of 32 LC/MS/MS experiments
that were made available in mzData files via PRIDE.
It was published by
Thomas L, Hodgson DA, Wentzel A, Nieselt K, Ellingsen TE, Moore J, Morrissey ER, Legaie R; STREAM Consortium, Wohlleben W, Rodríguez-García A, Martín JF, Burroughs NJ, Wellington EM, and Smith MC in
Mol Cell Proteomics. 2012 Feb;11(2):M111.013797 (PubMed).
These experiments give a good view into changes to the relative concentrations of many metabolic enzymes
in the environmental bacterium S. coelicolor in response to changes in phosphate-containing nutrient levels.
On the whole the experiments were well done, although there was significant, reproduced supression of
early eluting peptides in all of the LC/MS/MS runs. This supression may have made the experiments insensitive to
some particular enzymes. However, for enzymes containing observable peptides with gradient elutions > 20% acetonitrile,
the relative protein regulatory responses in could be inferred with reasonable accuracy from this data set.
Data set of the week: (2012/05/07)
Cells lacking β-actin are genetically reprogrammed and maintain conditional migratory capacity. Overall rating: very good data (general interest)
This data set consisted of 2 LC/MS/MS experiments
that were made available in mzData files via PRIDE.
It was published by
Tondeleir D, Lambrechts A, Mueller M, Jonckheere V, Doll T, Vandamme D, Bakkali K, Waterschoot D, Lemaistre M, Debeir O, Decaestecker C, Hinz B, Staes A, Timmerman E, Colaert N, Gevaert K, Vandekerckhove J, and Ampe C in
Mol Cell Proteomics. 2012 Mar 22 (PubMed).
In this study, the authors use an unusual combination of SILAC relative quantitation and
combined fractional diagonal chromatography (COFRADIC) to study what happens to mouse embryonic fibroblast cells
when then lack an important cytoskeletal protein. Rather than the typical SILAC experiment in which heavy lysine and arginine
residues are used, this experimental design uses heavy methionine and COFRADIC to produce fractions enriched in peptides
containing oxidized methionine residues. While the use of an affinity technique has the potential to complicate
quantitative experiments, these experiments seem to have worked out quite well and generated some valuable
insights into the metabolic creativity shown by the fibroblasts in the face of what might seem to be an
insurmountable challenge.
Data set of the week: (2012/04/29)
Kinome analysis of receptor-induced phosphorylation in human natural killer cells. Overall rating: very good data (general interest)
This data set consisted of 3 LC/MS/MS experiments,
that were made available in the form of Mascot "DAT" files via TRANCHE.
It was published by
König S, Nimtz M, Scheiter M, Ljunggren HG, Bryceson YT, and Jänsch L. in
PLoS One. 2012 7:e29672 (PubMed).
The results presented in this study make very good use of high accuracy mass measurements of both
parent and fragment ion for their biological application — determining phosphorylation changes in
natural killer (NK) cells caused by changes in receptor stimulation. These cytotoxic leucocytes are known to
have kinome changes associated with such stimulation, but the phosphorylation domain changes associated with
specific stimulations have not been fully explored. This paper makes a start in this type of interesting, cell-specific
investigation that makes use of clinically-derived cells for kinome study.
Data set of the week: (2012/04/22)
Quantification of mRNA and protein and integration with protein turnover in a bacterium. Overall rating: very good data (specialist interest)
This data set consisted of 42 LC/MS/MS runs from single dimension chromatography experiments.
It was published by
Maier T, Schmidt A, Güell M, Kühner S, Gavin AC, Aebersold R, and Serrano L. in
Mol Syst Biol 2011 7:511 (PubMed).
The data in these experiments give a good example of a straightforward analysis of the relationship between
protein and mRNA concentrations in a clinically important model organism, Mycoplasma pneumoniae. The results also
provide the best insights into the proteome of this prokaryote currently available, which has not be thoroughly studied even though
it has a comparatively simple genome and it is one of the primary causes of atypical bacterial pneumonia. The reproducibility
of this data was somewhat compromised by the consistent bias against early eluting peptides in the HPLC runs — very few peptides
that would be expected to elute at < 15% acetonitrile were observed.
Data set of the week: (2012/04/15)
Proteomic and phosphoproteomic comparison of human ES and iPS cells. Overall rating: very good data (general interest)
This data set consisted of 88 LC/MS/MS runs from multiple-dimensional chromatography experiments.
It was published by
Phanstiel DH, Brumbaugh J, Wenger CD, Tian S, Probasco MD, Bailey DJ, Swaney DL, Tervo MA, Bolin JM, Ruotti V, Stewart R, Thomson JA, and Coon JJ in
Nat Methods 2011 8:821-7 (PubMed).
The results here were a good representation of the proteins and phosphorylated domains that could be readily sampled
in human embryonic stem cells and induced pluripotent stem cells. The techniques used were well described and
the measurements were in general very good. The studies were performed using a dual-cell quadrupole linear ion
trap-orbitrap hybrid mass spectrometer (dcQLT-Orbitrap), which produced high resolution, high accuracy parent and fragment ion measurements.
The data was made available through the authors' lab database site, the
Stem Cell-Omics Repository (SCOR).
Data set of the week: (2012/04/08)
Comparison of proteomic and transcriptomic profiles in the bronchial airway epithelium of current and never smokers. Overall rating: excellent data (worth study)
This data set consisted of 589 LC/MS/MS runs of 1D SDS-PAGE gel bands and experimental summaries.
The data was published by
Steiling K, Kadar AY, Bergerat A, Flanigon J, Sridhar S, Shah V, Ahmad QR, Brody JS, Lenburg ME, Steffen M, and Spira A in
PLoS One. 2009 4:e5043 (PubMed).
This excellent study contrasted the proteomes of non- and current-smokers in
a very relevant tissue, bronchial airway epithelium. The results remain the definitive proteome
in this clinical tissue and contains some of the best observations for a number of rarely observed
proteins, such as TPPP3 (tubulin polymerization-promoting protein family member 3), SPATA18 (spermatogenesis associated 18 homolog),
ODF3B (outer dense fiber of sperm tails 3B), SPA17 (sperm autoantigenic protein 17) and ENSP00000387851 (member of the ciliary
rootlet coiled-coil family).
Data set of the week: (2012/04/1)
The matrisome: in silico definition and in vivo characterization by proteomics of normal and tumor extracellular matrices. Overall rating: very good data (specialist interest)
This data set consisted of 98 LC/MS/MS runs and experimental summaries.
The data was published by
Naba A, Clauser KR, Hoersch S, Liu H, Carr SA, and Hynes RO in
Mol Cell Proteomics 2011 mcp.M111.014647 (PubMed).
The idea behind collecting this data set was to define which proteins compose the
extracellular matrix and to discover which proteins would be contributed to the extracellular matrix by
the host in a xenograft experiment. The results do a good job of determining the protein complement of
this material in human tissue. The xenograft experiment — growing human-source tumours in live mice —
clearly shows that both the tumour cells and mouse host tissue contribute to the proteins in the tumour-associated
matrix. The value of the data was somewhat reduced by the relatively large number of detectable chemical artifacts,
particularly the carbamylation and carbamidomethylation of peptide N-terminii and lysine sidechains.
Data set of the week: (2012/03/26)
Investigating the macropinocytic proteome of Dictyostelium amoebae by high-resolution mass spectrometry. Overall rating: very good data (general interest)
This data set consisted of one large LC/MS/MS run.
The data was published by
Journet A, Klein G, Brugière S, Vandenbrouck Y, Chapel A, Kieffer S, Bruley C, Masselon C, and Aubry L in
Proteomics. 2012 12:241-5 (PubMed).
Dictyostelium discoideum is one of the more peculiar organisms used in research. It is
a free-living "slime mold", commonly found in leaf litter on any temperate forest floor. In this study the
authors have characterized the proteins involved in the unusual method that the amoeboid form of this organism
uses to take in nutrients from the environment: macropinocytosis. The experimental methods used were very well done and the
results significantly extend what is known about both this process and the organism itself.
Data set of the week: (2012/03/18)
Proteogenomic analysis of Candida glabrata using high resolution mass spectrometry. Overall rating: excellent data (worth study)
This data set consisted of 70 LC/MS/MS
using both SDS PAGE protein and SCX peptide separation techniques.
The data was published by
Prasad TS, Harsha HC, Keerthikumar S, Sekhar NR, Selvan LD, Kumar P, Pinto SM, Muthusamy B, Subbannayya Y, Renuse S, Chaerkady R, Mathur PP, Ravikumar R, and Pandey A in
J Proteome Res. 2012 11:247-60 (PubMed).
Candida glabrata is a haploid yeast (a.k.a., Torulopsis glabrata). It was long
thought to be a human commensal organism, but it has been shown to cause pathogenic infections
in immune-compromised individuals. This study of the organism's proteome, performed using FTMS with high resolution for
both the parent and fragment ions, provides a nice insight into the observable proteome of this poorly studied
species. It also provides an excellent set of data to compare with an existing (but relatively untested) genome sequence to
discover novel genes, understand the extent of amino acid polymorphisms and compare the post-translational modification
of domains with other, better studied, yeast species.
Data set of the week: (2012/03/11)
The ethylmalonyl-CoA pathway is used in place of the glyoxylate cycle by Methylobacterium extorquens AM1 during growth on acetate. Overall rating: excellent data (worth study)
This data set consisted of 6 LC/MS/MS
runs from whole cell lysates of the organism grown under specific conditions.
The data was published by
Schneider K, Peyraud R, Kiefer P, Christen P, Delmotte N, Massou S, Portais JC, and Vorholt JA in
J Biol Chem. 2012 287:757-66 (PubMed).
This study effectively defined the observable proteome of Methylobacterium extorquens, a Gram-negative bacterium
that lives on plant leaves (click here
for an amusing short presentation on this organism). Even though the title of the study suggests that
the study may have limited scope, each LC/MS/MS run generated identifications for ~40% of the proteins coded in the
complete genome. The analysis presented in GPMDB used the proteomes from three stains of the organism — AM1, DM4 and PA1 —
to be sure that no genes were absent because of errors in the specific genome assembly of an individual
strain. This analysis showed that the AM1 strain assembly was very good, with only a small number of
proteins from the PA1 and DM4 proteomes found without corresponding AM1 orthologs.
Data set of the week: (2012/03/04)
Comparative proteomic analysis of eleven common cell lines reveals ubiquitous but varying expression of most proteins. Overall rating: very good data (general interest)
This data set consisted of 181 LC/MS/MS
runs from lysates of 11 different laboratory cell lines.
The data was published by
Geiger T, Wehner A, Schaab C, Cox J, and Mann M in
Mol Cell Proteomics 2012 Jan 25 (PubMed).
If you ever wanted to know what proteins were readily observable in
A549, GAMG, HEK293, HeLa, HepG2, K562,
MCF7, RKO, U2OS, Jurkat, HEK293, LnCap, HeLa or K562 cells, this is the data set for you. It is probably the
largest single data set generated for a publication using the current generation of Orbitrap technology. The
experiments were done using HCD fragmentation and consistent chromatographic and sample
preparation methods. The information is a good compliment to the earlier DSOTW Initial characterization of the human central proteome
where there is overlapping information generated with conventional CID.
Data set of the week: (2012/02/26)
Systematic phosphorylation analysis of human mitotic protein complexes. Overall rating: very good data (general interest)
This data set consisted of 213 LC/MS/MS
affinity purification experiments.
The data was published by
Hegemann B, Hutchins JR, Hudecz O, Novatchkova M, Rameseder J, Sykora MM, Liu S, Mazanek M, Lénárt P, Hériché JK, Poser I, Kraut N, Hyman AA, Yaffe MB, Mechtler K, and Peters JM in
Sci Signal. 2011 4:rs12. (PubMed).
These results were good examples of the use of proteomics to target an aspect of a particular cell process, in
this case the role of phosphorylation in mitosis. The experimental protocols do a good job of isolating the
relavent proteins and generating easily interpretted phophopeptide spectra. The chromatography and
mass spectrometry were very well done and consistent across the data set. An unusual feature of this data set was
the presence of relatively strong signals from the protease domain (picornain 3C) of the human rhinovirus B-14 polyprotein. While
it is known that HeLa cells are susceptible to rhinovirus (common cold) infections, this data may be the first
experimental confirmation of a rhinovirus infection in cell culture based on proteomics methods.
Data set of the week: (2012/02/19)
The quantitative proteome of a human cell line. Overall rating: very good data (specialist interest)
This data set consisted of 59 LC/MS/MS
runs from U2-OS cell lysates.
The data was published by
Beck M, Schmidt A, Malmstroem J, Claassen M, Ori A, Szymborska A, Herzog F, Rinner O, Ellenberg J, and Aebersold R. in
Mol Syst Biol. 2011 7:549 (PubMed).
This study provides a large set of consistently good quality, journeyman data focussed on creating a catalog of proteins
present in a common cell line. The U2-OS line was derived from a female sarcoma with very few normal chromosomes and hypertriploid chromosome counts.
The cell culture used appears to have relatively clean, with little if any evidence of the presence of viruses or Mycoplasma. Any group
interested in quantifying unlabelled proteomics data, investigating rare post-translational modifications or developing
quality control metrics should take a look at this data.
Data set of the week: (2012/02/12)
Comprehensive proteomic analysis of human bile. Overall rating: excellent data (worth study)
This data set consisted of 37 LC/MS/MS
runs and summaries, from multidimensional chromatography experiments.
The data was published by
Barbhuiya MA, Sahasrabuddhe NA, Pinto SM, Muthusamy B, Singh TD, Nanjappa V, Keerthikumar S, Delanghe B, Harsha HC, Chaerkady R, Jalaj V, Gupta S, Shrivastav BR, Tiwari PK, and Pandey A. in
Proteomics. 2011 Dec;11(23):4443-53 (PubMed).
This series of multidimensional chromatography runs using high resolution MS and HCD MS/MS did exactly what
the title said: it provides a comprehensive catalogue of the proteins and consistituent peptides that
are to be expected when human bile is analyzed. It contains many best-to-date observations of proteins, even
ones that are not normally associated with bile, such as hornerin and dermcidin. The methods used produced
surprisingly good recovery of cysteine-containing peptides, which are often depleted in proteomics measurements.
Data set of the week: (2012/02/05)
Chemoproteomics profiling of HDAC inhibitors reveals selective targeting of HDAC complexes. Overall rating: very good data (general interest)
This data set consisted of 128 experiments
representing LC/MS/MS runs coupled with targeted affinity purification methods.
The data was published by
Bantscheff M, Hopf C, Savitski MM, Dittmann A, Grandi P, Michon AM, Schlegl J, Abraham Y, Becher I, Bergamini G, Boesche M, Delling M, Dümpelfeld B, Eberhard D, Huthmacher C, Mathieson T, Poeckel D, Reader V, Strunk K, Sweetman G, Kruse U, Neubauer G, Ramsden NG and Drewes G. in
Nat Biotechnol. 2011 29:255-65 (PubMed).
The results demonstrate that the best way to find and quantitate relatively rare proteins is to utilize a targeted-affinity
purification approach. The protocols described in the paper work very well and the measurements were
well done. The peptide identification work in the paper was rather cursory, but that does not affect the biological conclusions or
the validity of the approach.
Data set of the week: (2012/01/29)
Modularity and hormone sensitivity of the Drosophila melanogaster insulin receptor/target of rapamycin interaction proteome. Overall rating: very good data (specialist interest)
This data set consisted of 138 experiments
representing LC/MS/MS runs from individual affinity purification protocols.
The data was published by
Glatter T, Schittenhelm RB, Rinner O, Roguska K, Wepf A, Jünger MA, Köhler K, Jevtov I, Choi H, Schmidt A, Nesvizhskii AI, Stocker H, Hafen E, Aebersold R, and Gstaiger M. in
Mol Syst Biol. 2011 7:547 (PubMed).
This study was a good example of the routine use of good quality proteomics technology to elucidate an interesting
aspect of biology. It examined the protein-protein interactions associated with the InR/TOR pathway in the well-established
Kc167 cell line. The measurements were unambigious, resulting in a significant number of indentifications of relatively
rare D. melanogaster proteins involved in this pathway. It also contained a nice survey of the detectable SNAPs present in this
cell line — fruit flies have a surprisingly large number of nsSNPs compared to mammal genomes.
Data set of the week: (2012/01/22)
Characterization of the Asia Oceania Human Proteome Organisation Membrane Proteomics Initiative Standard using SDS-PAGE shotgun proteomics. Overall rating: very good data (general interest)
This data set consisted of 6 experiments
from LC/MS/MS runs.
The data was published by
Peng L, Kapp EA, McLauchlan D, and Jordan TW. in
Proteomics 2011 11:4376-84 (PubMed).
These experiments provide insight into how straightforward it has become to identify membrane proteins. Using a fairly
simple sample preparation method and LC/MS/MS with an LTQ instrument, the results show that it is possible to easily
identify large numbers of membrane proteins. It is still common for people to suggest that membrane proteins are
"difficult" using proteomics techniques. These results show that they are really no more difficult than
any other class of protein, so long as they can be kept in solution long enough to be digested.
Data set of the week: (2012/01/15)
Deep proteome and transcriptome mapping of a human cancer cell line. Overall rating: excellent data (leading the field)
This data set consisted of 164 experiments
from multidimensional LC/MS/MS runs.
The data was published by
Nagaraj N, Wisniewski JR, Geiger T, Cox J, Kircher M, Kelso J, Pääbo S, and Mann M. in
Mol Syst Biol. 2011 7:548 (PubMed).
This data set is an extensive investigation of how many peptides can be identified from the limited proteome of a
single human cell line using a combination of straight-forward LC/MS/MS
methods, multidimensional chromatography and multiple proteases, adding in high resolution MS/MS via HCD, and doing careful,
consistently state-of-the-art lab work. For the large number of groups that use HeLa cells, this work should serve
as a reference for what can be seen and what sort of experiment should be done to see it. For anyone interested in bioinformatics
and algorithm development, the scale (> 200,000 protein identifications) and precision of the work makes it an excellent
example for trying out new ideas. It is also an excellent raw data set to find novel post-translational modifications, splice
variants, viral contaminants and amino acid polymorphisms.
Data set of the week: (2012/01/08)
iPRG-2011: Study Materials for Identification of Electron Transfer Dissociation (ETD) Mass Spectra. Overall rating: very good data (specialist interest)
This data set consisted of 1 SCX fraction
LCMS/MS run on a Thermo Orbitrap-LTQ hybrid instrument.
The data was made available on TRANCHE by the ABRF iPRG group
Robert J Chalkley, Nuno Bandeira, John Cottrell, Eric Deutsch, Eugene A. Kapp, Henry H. Lam, W. Hayes McDonald and Thomas Neubert
and has been described on the iPRG web site.
This rather oddball dataset provides more insight into the "chili-cook-off" mentality associated with
evaluating bioinformatics algorithms than it does into the current real-world problems in biomedical research.
Tests of this sort can be useful when their goals are to provide feedback
to algorithm & user interface designers and to inform users of the characteristics of algorithm performance.
It is questionable as to whether any of such aims were achieved by analyzing this data set.
The data was artificially removed from context (only one of 21 SCX fractions was made available). The
sample preparation methods used generated very high levels of non-enzymatic cleavage (22% of observable peptides),
unusually high levels of asparagine deamination (48% of N-containing peptides) and peptide N-terminal glutamine
cyclization (88% of peptides with an N-terminal Q). The mass measurements had large parent ion and fragment
ion systematic errors (+5 ppm and -0.25 Da respectively) and standard deviations (4 ppm and 0.3 Da). The proteins
in the sample were heavily skewed towards the cytosolic proteins and the added human sequence standard proteins (Sigma UPS). The
lack of the other 20 fractions made it impossible to draw any conclusions about the relative observability of
the added UPS proteins (and the ribosomal E. coli protein contaminants in the UPS preparation). It was
very unclear why such a complex, poorly controlled sample/measurement combination was used to test
algorithms and so little information about the true character of the sample was provided to the participating
groups. This hidden complexity resulted in more of an examination of the detective abilities of the groups than a
useful test of the algorithms.
Data set of the week: (2012/01/01)
Proteomic Analysis of a Pleistocene Mammoth Femur Reveals More than One Hundred Ancient Bone Proteins. Overall rating: excellent data (leading the field)
This data set consisted of 4 data sets
constructed from several different types of experiment.
The data was published by
Cappellini E, Jensen LJ, Szklarczyk D, Ginolhac A, da Fonseca RA, Stafford TW, Holen SR, Collins MJ, Orlando L, Willerslev E, Gilbert MT, and Olsen JV. in
J Proteome Res. 2011 Dec 14 (PubMed).
This data was a truly amazing example of what can be obtained using samples that have
simply sat around outside for 43,000 years. The preservation of the detectable peptides was unexpectedly good.
The experiments were state-of-the-art at all levels and the data should be examined extensively by any
group interested in detecting amino acid polymorphisms associated with evolutionary change. The
analysis in the original paper was correct at the top level (the proteins detected) but was less well done at the level of
amino acid polymorphisms and side chain modifications. There are several more publications' worth of information
in this extraordinary data.
Copyright © 2012, The Global Proteome Machine Organization.
Privacy Statement
|