19th Sep 2006
Manuscript Number: MSB-06-261
Title: Comparative proteomic and transcriptomic profiling of the fission yeast Schizosaccharomyces pombe
Author: Dieter Wolf
Dear Dr. Wolf,
Thank you again for sending your manuscript for consideration for Molecular Systems Biology. We have now heard back from the three referees whom we asked to evaluate your manuscript. As you will see from the reports below, the reviewers find the topic of your study of potential interest. However, they raise several serious concerns, which should be carefully addressed prior publication.
One of the major concerns expressed by the referees refers to the accuracy and reproducibility of the quantitation of the proteomic data. It appears thus that additional validation is required to strengthen the quantitative aspect of the study.
If you feel you can satisfactorily deal with these points and those listed by the referees, you may wish to submit a revised version of your manuscript. Please attach a covering letter giving details of the way in which you have handled each of the points raised by the referees.
Thank you for the opportunity to consider your work for publication. I look forward to your revision.
Molecular Systems Biology
If you do choose to resubmit, please click on the link below to submit the revision online before 8th Nov 2006.
IMPORTANT: When you send the revision, we will require the following items:
1. the manuscript text in LaTeX, RTF or MS Word format
2. a letter with a detailed description of the changes made in response to the referees
3. three to four 'bullet points' highlighting the main findings of your study
4. a 'standfirst text' summarizing in two sentences the study
5. an extended synopsis (full-length Articles only, see www.nature.com/msb/authors) as seperate file in LaTeX, RTF or MS Word format
Please use the link below to access the Licence to Publish. Please complete and sign this on behalf of all authors, with their consent, and fax to the editorial office on +44 1256 810972 on the same day you submit your revision.
We will need this form in order to proceed with processing this submission.
As a matter of course, please make sure that you have correctly followed the instructions for authors as given on the submission website.
Reviewer #1(Remarks to the Author):
Schmidt et al. use a mass spectrometry approach combined with different prefractionations to detect about one third of the proteome in fission yeast. These data are then compared with similar data in budding yeast and with ORFeome data in fission yeast, revealing both similarities and differences. There is a relative scarcity of genome-wide and especially proteomic data for fission yeast, and this manuscript fills a gap in this respect. Some of the insight gained from the comparisons is also quite intriguing and unique, which should be of interest for a wider audience.
The manuscript, however, has several flaws in its current form, which will need to be addressed to make it suitable for publication. Issues of concern are listed below:
1) The main problem is that there is no validation or test for their ion counting quantitation. The authors only do this by comparing to a data set on cytokinesis proteins (Fig. 2a). Whilst this shows they are on the right track, it does not give indication of how accurate their measurements are. I would call this semi-quantitative not really quantitative. Although the correlation coefficient in Fig. 2a is high, it is based on very few data points. It would be useful to calculate the significance of this correlation, but the main problem is that much of the validity of this paper depends on this limited comparison.
There are several ways to do a better validation such as using standard protein spiked in sample or showing that the order in one experiment is consistent with the other experiments, or using isotope labelled reagent ICAT ITRAQ to compare. This would also give some standard errors / deviations. Also, no information to give an idea of experimental reproducibility is included, i.e. if sample analysis is repeated how good is the overlap? Further, I cannot get a feel for really to what extent their statistical model improves quantitation since they do no show data compared with other theoretical approaches or to experimental approaches.
I feel that at the very least this caveat should be included in the discussion so as not to mislead the audience. The issue of experimental reproducibility should be dealt with since this only requires some further analysis of data they already have.
2) Overall, it would be good to indicate P-values besides the correlation coefficients throughout, as the comparisons are based on different numbers of proteins. Correlations can be weak but still significant (or vice versa). For example, on p. 12 the correlation of 0.22 should not be called insignificant without doing the stats.
3) p. 12: the authors make a big point from the fact that transcript levels correlate better with their data than with the ORFeome data by Matsuyama et al. As the genes were all expressed from the same promoter to generate the ORFeome data and if one assumes that mRNA levels influence protein levels, it is not surprising that there is no correlation.
4) p. 13: The mRNA levels seem to have been determined using the absolute signal intensities from spotted arrays. This is far from ideal as many other factors such as probe length, GC content etc. will influence the signals. It would be much better to use a genomic DNA reference on the same hybridisation to correct for these other factors, or use signals from Affymetrix chips which are more quantitative.
5) Some of the findings are not or only poorly discussed. For example, why would some components have relatively more protein than mRNA (p. 15)?. Why do some groups show negative correlation between mRNA and protein levels (Fig. 3b)? What is the biological meaning of data shown in Fig. 3c and Fig. 4B?
6) Fig. 2c: The essential and non essential genes together should make up the whole genome. Why then is the median ASC for both of these groups lower than for all genes (12.6 and 7.5 vs 14.6)?
7) Generally, the manuscript seems too long given its content. Especially the end (comparison with S. cerevisiae, from p. 10) is weaker and would benefit from shortening. Some of the differences could easily be caused by subtle differences in experiments (media) as suggested by the finding that glycolysis proteins behave differently (Fig. 5a).
8) p. 10: the GeneDB website should be indicated rather than repeat GeneDB in parentheses.
9) Fig. 2e: r value is awkwardly placed.
10) Fig. 4c is unnecessary as the 3 values can easily just be stated in the text.
9) Legend of Fig. 5 misses section e.
Reviewer #2(Remarks to the Author):
MSB-06-261 "Comparative proteomic and transcriptomic profiling of the fission yeast Schizosaccharomyces pombe"by Michael W. Schmidt, Andres Houseman, Alexander R. Ivanov, Dieter A. Wolf
The authors used multi-dimensional pre-fractionation and tandem mass spectrometry to comprehensively identify proteins in S.pombe. Following that, they used normalized spectral counts to deduce the relative abundances of these proteins, then looked at the correlation with mRNA expression as determined by gene expression profiling. Finally the authors determined how gene/protein expression ratios varied for members of the same protein complex or functional pathway.
There were some interesting findings presented in this paper, and it describes a great deal of nicely executed experimental work. However, there are several issues which need to be addressed before publication.
The authors illustrated clearly that they generally detected proteins representative of the entire proteome (ranging in molecular weight, iso-electric point, and GO attributes). I also found it reasonable that they were under-sampled in trans-membrane domain proteins and S.pombe specific proteins given their experimental method.
The dissection of abundant and non-abundant proteins in terms of functional classifications seemed thorough, and the presentation of the correlation data regarding mRNA and protein expression was succinctly presented.
The finding regarding the similar mRNA protein expression ratios for co-clustered proteins was interesting, although the section wasn't clearly presented (not sure what 'we determined the protein-mRNA ratio individually for every protein in a given pathway, family, or complex' means specifically).
In terms of concept, what the researchers see as a linear relationship between gene and protein expression may represent a vast over-simplification. In addition to the overall abundance of a transcript, the coding sequences themselves may very in translational efficiencies (ie codon adaptation index). The notion of efficient versus non-efficient translation is not addressed in their model. Also, there are presumably a wide variety of post-transcriptional modifications that can vary protein abundance from a constant transcript level. While it is understandable that it would be difficult to include these elements in an analysis model, the authors make no mention of the limitations of directly comparing transcript and protein expression levels, and do not mention potential confounders to their study in this regard.
The article was missing certain key references, specifically when referring to databases or truisms the author expects to be common knowledge (ie 'our comparison reinvigorates the conclusion gained from previous functional genomics studies that similarities in the control of gene expression in the two yeasts ...').
In the introduction, the authors clearly state the limitations of mass spectrometry, but not the advantages, thus not clearly justifying the choice of method at the outset. The authors also state that there is 'potential for interference of epitope tags with endogenous protein function, expression', however in the results section authors use data obtained from epitope tagging citing another author as stating 'tagging did not interfere with normal protein expression'. Lastly, the author ends the introduction by saying that they 'compare mRNA and protein expression profiles' with no clear reasoning given (at this point).
The authors outline their experimental fractionation scheme in diagram 1a, but do not indicate specifically how they actually derived their identifications: was it the overlap of fractionation methods used, or was it the method that obtained the highest fidelity?
In the results section, the authors based the accuracy of the ASC data fitting on 10 cytokenisis proteins, which is a small reference set to base the entire method upon, and so could be misleading. Can the authors demonstrate the generality of their method? Also, the authors present ASC values in the results section without listing accompanying p-values - where any statistical tests performed, and if so was the nominal significance?
The authors state in the results section that the low protein concentration of phosphoproteins is due to their propensity to be modulated by phosphorylation after expression, but made no mention of the fact that LC-MS (i.e. ion suppression) may result in poor coverage of phosphopeptides.
Is abundance an overall stoichiometric value ([x] > [y]) or is there a temporal aspect ([x] = [y], Δx > Δy)? The discussion of the eIF complex is confusing here. The authors state that eIF4 is more constitutively expressed than the other complex members, which justifies it's high abundance score, however this doesn't necessarily mean that eIF4 subunits are expressed at higher concentrations than the other complex members. The authord later compare mRNA expression values (collected at one time-point during the cell-cycle), and attempt to correlate these with protein expression, reasoning that their protein abundance values correlate to concentration. The authors need to make a clear distinction on whether his method assays overall protein concentration or aggregate temporal abundance and compare measures accordingly.
The authors describe the relationship between gene and protein expression in protein complexes stating that there is a lack of 'or even negative' correlations between these expressions. The authors further cite the 80S ribosome, the 26S proteasome and the CCT complex as examples of complexes that may have sub-units controlled substantially at the posttranscriptional level. However this finding implies that there are some sub-units of every complex which are under different levels of post-transcriptional modification, as this would lead to the lack of correlation between gene and protein abundance levels among complexes. This is a fairly significant claim to make, given the support in the literature for unified expression and regulation of protein complexes. If the authors wish to make this claim they should present more compelling data than they have included in the manuscript, (or present all data on the expression of complexes within the supplementary data, which they did not do). With the number of protein complexes reported in the literature, it seems doubtful that there are no /few other examples of similar gene/protein expression ratios among protein complexes (see work by Gerstein and Greenbaum (Yale University) in which they found a strong correlation between mRNA and protein levels in stable complexes, but less of a relationship in transient complexes). Perhaps the authors should include a list of complexes that defy this trend, along with a plausible biological explanation as to why.
Reviewer #3(Remarks to the Author):
In this manuscript, the authors used mass spectrometry and DNA microarrays to identify and quantify a large of proteins in the fission yeast S. pombe under one condition. The data is organized into functional and physical categories. They correlated mRNA and protein levels based on functional categories using a variety of statistical approaches. Their results are compared to the similar data collected from the budding yeast, S. cerevisiae.
The manuscript represents a considerable amount of effort and data analysis. The authors used a number of sophisticated statistical methods interpret their data. It is a tour-de-force of data analysis. However, the novelty of the observations is unclear since similar reports have published on other organisms. Experiments to follow up interesting observations were not performed by either Westerns or qRT-PCR. It is unclear how the authors merged the MS results from 6 different multidimensional LC-MS/MS approaches to derive a single adjusted spectral count number for each protein? It is hard to believe that all 6 LC-MS/MS methods produce consistent ASC values. What is the variation in the ASC values?
1. The normalized spectral counting could stand alone in a separate manuscript in an analytical chemistry or proteomics journal. Figure 2A is not convincing; additional experiments (Westerns) would strengthen this important observation.
2. The authors need to list the components of the 13 different protein complexes. How much variation was observed in the ASC for the different components?
3. One would expect a higher level of phosphorylation and ubiquitylatd proteins based on previous large-scale studies. Most of the putatively modified peptides in Table 1 show multiple phosphorylated and ubiquitinated residues on the same peptide. Why so few singly phosphorylated residues? The validity of this table is questionable, especially since the annotated spectra are not provided.
4. Figure 2e, 3a, Supplemental 1A are shown in base 2 while others use base 10. Why? The tables using base 2 are not very informative since most of the data groups together into one, large group. The value and meaning of Supplemental Figure 5 needs to be addressed.
5. The supplemental material is quite extensive.
6. The statistical analysis is impressive.