Abstract
This paper introduces a statistical and other analysis of peer reviewers in order to approach their “quality” through some quantification measure, thereby leading to some quality metrics. Peer reviewer reports for the Journal of the Serbian Chemical Society are examined. The text of each report has first to be adapted to word counting software in order to avoid jargon inducing confusion when searching for the word frequency: e.g. C must be distinguished, depending if it means Carbon or Celsius, etc. Thus, every report has to be carefully “rewritten”. Thereafter, the quantity, variety and distribution of words are examined in each report and compared to the whole set. Two separate months, according when reports came in, are distinguished to observe any possible hidden spurious effects. Coherence is found. An empirical distribution is searched for through a Zipf–Pareto rank-size law. It is observed that peer review reports are very far from usual texts in this respect. Deviations from the usual (first) Zipf’s law are discussed. A theoretical suggestion for the “best (or worst) report” and by extension “good (or bad) reviewer”, within this context, is provided from an entropy argument, through the concept of “distance to average” behavior. Another entropy-based measure also allows to measure the journal reviews (whence reviewers) for further comparison with other journals through their own reviewer reports.
Similar content being viewed by others
References
Ausloos, M. (2012a). Generalized Hurst exponent and multifractal function of original and translated texts mapped into frequency and length time series. Physical Review E, 86, 031108.
Ausloos, M. (2012b). Measuring complexity with multifractals in texts: Translation effects. Chaos, Solitons and Fractals, 45, 1349–1357.
Ausloos, M. (2013). A scientometrics law about co-authors and their ranking: The co-author core. Scientometrics, 95, 895–909.
Bornmann, L. (2011). Scientific peer review. Annual Review of Information Science and Technology, 45, 199–245.
Bougrine, H. (2014). Subfield effects on the core of coauthors. Scientometrics, 98, 1047–1064.
Callaham, M. L., Wears, R. L., & Waeckerle, J. F. (1998). Effect of attendance at a training session on peer reviewer quality and performance. Annals of Emergency Medicine, 32, 318–322.
Cristelli, M., Batty, M., & Pietronero, L. (2012). There is more than a power law in Zipf. Scientific Reports, 2, 812. doi:10.1038/srep00812.
Darooneh, A. H., & Shariati, A. (2014). Metrics for evaluation of the author’s writing styles: Who is the best? Chaos: An Interdisciplinary Journal of Nonlinear Science, 24, 033132.
Dubois, D. M. (2014). Computational language related to recursion incursion and fractal. In F. Lowenthal & L. Lefebvre (Eds.), Language and recusrsion (pp. 149–165). New York: Springer.
Fairthorne, R. A. (1969). Empirical hyperbolic distributions (Bradford–Zipf–Mandelbrot) for bibliometric description and prediction. Journal of Documentation, 25, 319–343.
Febres, G., & Jaffe, K. (2014). Quantifying literature quality using complexity criteria. arXiv:1401.7077.
Ferrer i Cancho, R. (2006). When language breaks into pieces: A conflict between communication through isolated signals and language. Bio Systems, 84, 242–253.
Feurer, I. D., Becker, G. J., Picus, D., Ramirez, E., Darcy, M. D., & Hicks, M. E. (1994). Evaluating peer reviews: Pilot testing of a grading instrument. JAMA, 272, 98–100.
Godlee, F., Gale, C. R., & Martyn, C. N. (1998). Effect on the quality of peer review of blinding peer reviewers and asking them to sign their reports: A randomized control trial. JAMA, 280, 237–240.
Goodman, S. N., Berlin, J., Fletcher, S. W., & Fletcher, R. H. (1994). Manuscript quality before and after peer review and editing at Annals of Internal Medicine. Annals of Internal Medicine, 121, 11–21.
Hill, B. M. (2004). The rank-frequency form of Zipf’s law. Journal of the American Statistical Association, 9, 1017–1026.
Jadad, A. R., Cook, D. J., Jones, A., Klassen, T. P., Tugwell, P., Moher, M., et al. (1998). Methodology and reports of systematic reviews and meta-analyses: A comparison of Cochrane reviews with articles published in paper-based journals. JAMA, 280, 278–280.
Justice, A. C., Cho, M. K., Winker, M. A., Berlin, J. A., & Rennie, D. (1998). Does masking author identity improve peer review quality? A randomized controlled trial. JAMA, 280, 240–242.
Laherrere, J., & Sornette, D. (1998). Stretched exponential distributions in nature and economy fat tails with characteristic scales. The European Physical Journal B-Condensed Matter and Complex Systems, 2, 525–539.
Lin, S. (2010). Rank aggregation methods. Wiley Interdisciplinary Reviews: Computational Statistics, 2, 555–570.
McCowan, B., Doyle, L. R., & Hanser, S. F. (2002). Using information theory to assess the diversity, complexity, and development of communicative repertoires. Journal of Comparative Psychology, 116, 166.
McKean, J. W., Terpstra, J. T., & Kloke, J. D. (2009). Computational rank-based statistics. Wiley Interdisciplinary Reviews: Computational Statistics, 2, 132–140.
McNutt, R. A., Evans, A. T., Fletcher, R. H., & Fletcher, S. W. (1990). The effects of blinding on the quality of peer-review: A randomized trial. JAMA, 263, 1371–1376.
Miskiewicz, J. (2013). Effects of publications in proceedings on the measure of the core size of coauthors. Physica A, 392, 5119–5131.
Neuhauser, D., & Koran, C. J. (1989). Calling Medical Care reviewers first: A randomized trial. Medical Care, 27, 664–666.
Oxman, A. D., Guyatt, G. H., & Singer, J. (1991). Agreement among reviewers of review articles. Journal of Clinical Epidemiology, 44, 91–98.
Publishing Research Consortium. (2008). Peer review in scholarly journals: Perspective of the scholarly community: An international study
Rodriguez, E., Aguilar-Cornejo, M., Femat, R., & Alvarez-Ramirez, J. (2014). Scale and time dependence of serial correlations in word-length time series of written texts. Physica A, 414, 378–386.
Shannon, C. (1948). A mathematical theory of communications. Bell System Technical Journal, 27, 379–423.
Shannon, C. (1951). Prediction and entropy of printed English. Bell System Technical Journal, 30, 50–64.
Siler, K., Lee, K., & Bero, L. (2015). Measuring the effectiveness of scientific gatekeeping. Proceedings of the National Academy of Sciences, 112, 360–365.
Strayhorn, J, Jr, McDermott, J. F, Jr, & Tanguay, P. (2015). An intervention to improve the reliability of manuscript reviews for the Journal of the American Academy of Child and Adolescent Psychiatry. The American Journal of Psychiatry, 150, 947–952.
van Rooyen, S., Godlee, F., Evans, S., Black, N., & Smith, R. (1999). Effect of open peer review on quality of reviews and on reviewers’ recommendations: A randomised trial. BMJ, 318, 23–27.
Wager, E., & Jefferson, T. (2001). The shortcomings of peer review. Learned Publishing, 14, 257–263.
Wieder, T. (2009). The number of certain rankings and hierarchies formed from labeled or unlabeled. Applied Mathematical Sciences, 3, 2707–2724.
Wolfe, D. A. (2009). Rank methods. Wiley Interdisciplinary Reviews: Computational Statistics, 2, 342–347.
Wolfe, D. A. (2010). Ranked set sampling. Wiley Interdisciplinary Reviews: Computational Statistics, 2, 460–466.
Zipf, G. K. (1949). Human behavior and the principle of least effort : An introduction to human ecology. Cambridge, Mass.: Addison Wesley Press.
Acknowledgments
This paper is part of scientific activities in COST Action TD1306 New Frontiers of Peer Review (PEERE).
Author information
Authors and Affiliations
Corresponding author
Appendix 1: Data analysis of unmanipulated reports
Appendix 1: Data analysis of unmanipulated reports
In this “Appendix”, the data analysis of the 10 Sept. reports, without any data manipulation, i.e. without in any way modifying the reports for their word content, is reported. It is shown in Table 8 that the power law exponent appears to be \(\simeq 0.781\pm 0.004\), with regression coefficient \(R^2 \in 0.989\), for the \(R_2\) report.
For \(R_4, R_5\), and \(R_6\), the power law exponent appears to be \(\simeq 0.70\pm 0.01\), with their regression coefficient \(R^2 \in 0.966, 0.938\), but the other exponents evenly spread on 0.423 till 0.683.
It was observed that \(R_2, R_5\) and \(R_6\) are among the top three longest reports. Thus should be less sensitive to “slight” data modifications.
In fact, this Table, through the found \(\alpha\) values, is proving that for short reports one has to be much concerned by the vocabulary; whence one has to distinguish the meaning of (short) words (like “a”, “c”, “k”).
In conclusion, it is highly meaningful to “adapt” (= rewrite) the reports. The bad thing is that life is not simplified from a scientific point of view since it takes time to rewrite reports in a useful way.
Rights and permissions
About this article
Cite this article
Ausloos, M., Nedic, O., Fronczak, A. et al. Quantifying the quality of peer reviewers through Zipf’s law. Scientometrics 106, 347–368 (2016). https://doi.org/10.1007/s11192-015-1704-5
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11192-015-1704-5