[go: up one dir, main page]

Skip to main content
Log in

Quantifying the quality of peer reviewers through Zipf’s law

  • Published:
Scientometrics Aims and scope Submit manuscript

Abstract

This paper introduces a statistical and other analysis of peer reviewers in order to approach their “quality” through some quantification measure, thereby leading to some quality metrics. Peer reviewer reports for the Journal of the Serbian Chemical Society are examined. The text of each report has first to be adapted to word counting software in order to avoid jargon inducing confusion when searching for the word frequency: e.g. C must be distinguished, depending if it means Carbon or Celsius, etc. Thus, every report has to be carefully “rewritten”. Thereafter, the quantity, variety and distribution of words are examined in each report and compared to the whole set. Two separate months, according when reports came in, are distinguished to observe any possible hidden spurious effects. Coherence is found. An empirical distribution is searched for through a Zipf–Pareto rank-size law. It is observed that peer review reports are very far from usual texts in this respect. Deviations from the usual (first) Zipf’s law are discussed. A theoretical suggestion for the “best (or worst) report” and by extension “good (or bad) reviewer”, within this context, is provided from an entropy argument, through the concept of “distance to average” behavior. Another entropy-based measure also allows to measure the journal reviews (whence reviewers) for further comparison with other journals through their own reviewer reports.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  • Ausloos, M. (2012a). Generalized Hurst exponent and multifractal function of original and translated texts mapped into frequency and length time series. Physical Review E, 86, 031108.

    Article  Google Scholar 

  • Ausloos, M. (2012b). Measuring complexity with multifractals in texts: Translation effects. Chaos, Solitons and Fractals, 45, 1349–1357.

    Article  Google Scholar 

  • Ausloos, M. (2013). A scientometrics law about co-authors and their ranking: The co-author core. Scientometrics, 95, 895–909.

    Article  Google Scholar 

  • Bornmann, L. (2011). Scientific peer review. Annual Review of Information Science and Technology, 45, 199–245.

    Article  Google Scholar 

  • Bougrine, H. (2014). Subfield effects on the core of coauthors. Scientometrics, 98, 1047–1064.

    Article  Google Scholar 

  • Callaham, M. L., Wears, R. L., & Waeckerle, J. F. (1998). Effect of attendance at a training session on peer reviewer quality and performance. Annals of Emergency Medicine, 32, 318–322.

    Article  Google Scholar 

  • Cristelli, M., Batty, M., & Pietronero, L. (2012). There is more than a power law in Zipf. Scientific Reports, 2, 812. doi:10.1038/srep00812.

    Article  Google Scholar 

  • Darooneh, A. H., & Shariati, A. (2014). Metrics for evaluation of the author’s writing styles: Who is the best? Chaos: An Interdisciplinary Journal of Nonlinear Science, 24, 033132.

    Article  Google Scholar 

  • Dubois, D. M. (2014). Computational language related to recursion incursion and fractal. In F. Lowenthal & L. Lefebvre (Eds.), Language and recusrsion (pp. 149–165). New York: Springer.

    Chapter  Google Scholar 

  • Fairthorne, R. A. (1969). Empirical hyperbolic distributions (Bradford–Zipf–Mandelbrot) for bibliometric description and prediction. Journal of Documentation, 25, 319–343.

    Article  Google Scholar 

  • Febres, G., & Jaffe, K. (2014). Quantifying literature quality using complexity criteria. arXiv:1401.7077.

  • Ferrer i Cancho, R. (2006). When language breaks into pieces: A conflict between communication through isolated signals and language. Bio Systems, 84, 242–253.

    Article  Google Scholar 

  • Feurer, I. D., Becker, G. J., Picus, D., Ramirez, E., Darcy, M. D., & Hicks, M. E. (1994). Evaluating peer reviews: Pilot testing of a grading instrument. JAMA, 272, 98–100.

    Article  Google Scholar 

  • Godlee, F., Gale, C. R., & Martyn, C. N. (1998). Effect on the quality of peer review of blinding peer reviewers and asking them to sign their reports: A randomized control trial. JAMA, 280, 237–240.

    Article  Google Scholar 

  • Goodman, S. N., Berlin, J., Fletcher, S. W., & Fletcher, R. H. (1994). Manuscript quality before and after peer review and editing at Annals of Internal Medicine. Annals of Internal Medicine, 121, 11–21.

    Article  Google Scholar 

  • Hill, B. M. (2004). The rank-frequency form of Zipf’s law. Journal of the American Statistical Association, 9, 1017–1026.

    MATH  Google Scholar 

  • Jadad, A. R., Cook, D. J., Jones, A., Klassen, T. P., Tugwell, P., Moher, M., et al. (1998). Methodology and reports of systematic reviews and meta-analyses: A comparison of Cochrane reviews with articles published in paper-based journals. JAMA, 280, 278–280.

    Article  Google Scholar 

  • Justice, A. C., Cho, M. K., Winker, M. A., Berlin, J. A., & Rennie, D. (1998). Does masking author identity improve peer review quality? A randomized controlled trial. JAMA, 280, 240–242.

    Article  Google Scholar 

  • Laherrere, J., & Sornette, D. (1998). Stretched exponential distributions in nature and economy fat tails with characteristic scales. The European Physical Journal B-Condensed Matter and Complex Systems, 2, 525–539.

    Article  Google Scholar 

  • Lin, S. (2010). Rank aggregation methods. Wiley Interdisciplinary Reviews: Computational Statistics, 2, 555–570.

    Article  Google Scholar 

  • McCowan, B., Doyle, L. R., & Hanser, S. F. (2002). Using information theory to assess the diversity, complexity, and development of communicative repertoires. Journal of Comparative Psychology, 116, 166.

    Article  Google Scholar 

  • McKean, J. W., Terpstra, J. T., & Kloke, J. D. (2009). Computational rank-based statistics. Wiley Interdisciplinary Reviews: Computational Statistics, 2, 132–140.

    Article  Google Scholar 

  • McNutt, R. A., Evans, A. T., Fletcher, R. H., & Fletcher, S. W. (1990). The effects of blinding on the quality of peer-review: A randomized trial. JAMA, 263, 1371–1376.

    Article  Google Scholar 

  • Miskiewicz, J. (2013). Effects of publications in proceedings on the measure of the core size of coauthors. Physica A, 392, 5119–5131.

    Article  Google Scholar 

  • Neuhauser, D., & Koran, C. J. (1989). Calling Medical Care reviewers first: A randomized trial. Medical Care, 27, 664–666.

    Article  Google Scholar 

  • Oxman, A. D., Guyatt, G. H., & Singer, J. (1991). Agreement among reviewers of review articles. Journal of Clinical Epidemiology, 44, 91–98.

    Article  Google Scholar 

  • Publishing Research Consortium. (2008). Peer review in scholarly journals: Perspective of the scholarly community: An international study

  • Rodriguez, E., Aguilar-Cornejo, M., Femat, R., & Alvarez-Ramirez, J. (2014). Scale and time dependence of serial correlations in word-length time series of written texts. Physica A, 414, 378–386.

    Article  Google Scholar 

  • Shannon, C. (1948). A mathematical theory of communications. Bell System Technical Journal, 27, 379–423.

    Article  MathSciNet  MATH  Google Scholar 

  • Shannon, C. (1951). Prediction and entropy of printed English. Bell System Technical Journal, 30, 50–64.

    Article  MATH  Google Scholar 

  • Siler, K., Lee, K., & Bero, L. (2015). Measuring the effectiveness of scientific gatekeeping. Proceedings of the National Academy of Sciences, 112, 360–365.

    Article  Google Scholar 

  • Strayhorn, J, Jr, McDermott, J. F, Jr, & Tanguay, P. (2015). An intervention to improve the reliability of manuscript reviews for the Journal of the American Academy of Child and Adolescent Psychiatry. The American Journal of Psychiatry, 150, 947–952.

    Google Scholar 

  • van Rooyen, S., Godlee, F., Evans, S., Black, N., & Smith, R. (1999). Effect of open peer review on quality of reviews and on reviewers’ recommendations: A randomised trial. BMJ, 318, 23–27.

    Article  Google Scholar 

  • Wager, E., & Jefferson, T. (2001). The shortcomings of peer review. Learned Publishing, 14, 257–263.

    Article  Google Scholar 

  • Wieder, T. (2009). The number of certain rankings and hierarchies formed from labeled or unlabeled. Applied Mathematical Sciences, 3, 2707–2724.

    MathSciNet  MATH  Google Scholar 

  • Wolfe, D. A. (2009). Rank methods. Wiley Interdisciplinary Reviews: Computational Statistics, 2, 342–347.

    Article  MathSciNet  Google Scholar 

  • Wolfe, D. A. (2010). Ranked set sampling. Wiley Interdisciplinary Reviews: Computational Statistics, 2, 460–466.

    Article  Google Scholar 

  • Zipf, G. K. (1949). Human behavior and the principle of least effort : An introduction to human ecology. Cambridge, Mass.: Addison Wesley Press.

    Google Scholar 

Download references

Acknowledgments

This paper is part of scientific activities in COST Action TD1306 New Frontiers of Peer Review (PEERE).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marcel Ausloos.

Appendix 1: Data analysis of unmanipulated reports

Appendix 1: Data analysis of unmanipulated reports

In this “Appendix”, the data analysis of the 10 Sept. reports, without any data manipulation, i.e. without in any way modifying the reports for their word content, is reported. It is shown in Table 8 that the power law exponent appears to be \(\simeq 0.781\pm 0.004\), with regression coefficient \(R^2 \in 0.989\), for the \(R_2\) report.

For \(R_4, R_5\), and \(R_6\), the power law exponent appears to be \(\simeq 0.70\pm 0.01\), with their regression coefficient \(R^2 \in 0.966, 0.938\), but the other exponents evenly spread on 0.423 till 0.683.

It was observed that \(R_2, R_5\) and \(R_6\) are among the top three longest reports. Thus should be less sensitive to “slight” data modifications.

In fact, this Table, through the found \(\alpha\) values, is proving that for short reports one has to be much concerned by the vocabulary; whence one has to distinguish the meaning of (short) words (like “a”, “c”, “k”).

In conclusion, it is highly meaningful to “adapt” (= rewrite) the reports. The bad thing is that life is not simplified from a scientific point of view since it takes time to rewrite reports in a useful way.

Table 8 Power law fit parameter, Eq. (4.1), for the 10 Sept. “unmanipulated” reports \(R_i\)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ausloos, M., Nedic, O., Fronczak, A. et al. Quantifying the quality of peer reviewers through Zipf’s law. Scientometrics 106, 347–368 (2016). https://doi.org/10.1007/s11192-015-1704-5

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11192-015-1704-5

Keywords

Navigation